Object Oriented Programming in R Part 2

Daniel Jacobs. Presented at Advanced R Book Club
9/7/2019

plot of chunk unnamed-chunk-1 OO YEAH!

S3 Review

S3=>

  • Commonly used generics ( summary, print etc. )
  • Common classes: e.g. factor
  • R's type system(s) ( base types versus class types )
  • Common methods: UseMethod, class

Goals

Today we will cover!

  • R6
  • S4
  • Tradeoffs

Expect to understand:

  • when to choose one OO system versus another
  • a lot more about Object Oriented Programming
  • A little bit of syntax

Relative Usage

plot of chunk unnamed-chunk-2

OO Concepts

Raise your hand if you are sorta familiar with…

  • Polymorphism
  • Encapsulation
  • Getters/setters
  • Reference Semantics
  • Multiple Dispatch

R6

Key Properties

  • Uses encapsulated OOP syntax : object$method()
  • Objects are mutable!

Advantages over S3

  • Reference Semantics
  • Familiar to non-R developers
  • Methods don't pollute the global namespace

Disadvantages over S3

  • Reference semantics

R6::Rclass

Only one method you need

library(R6)
Accumulator <- R6Class("Accumulator", list(
  sum = 0,
  add = function(x = 1) {
    self$sum <- self$sum + x
    invisible(self)
  })
)

Insantiation

acc = Accumulator$new()

Access properties with $

acc$sum
[1] 0
acc$add(1)
acc$sum
[1] 1

Method Chaining

Method Chaining (without the %>%!)

acc$add(10)$add(10)$sum
[1] 21

Initialization

Defining $initialize will modify the behavior of “new”

Person <- R6Class("Person", list(
  name = NULL,
  age = NA,
  initialize = function(name, age = NA) {
    stopifnot(is.character(name), length(name) == 1)
    stopifnot(is.numeric(age), length(age) == 1)

    self$name <- name
    self$age <- age
  }
))

Printing

Defining $print will modify the default printing of the object

Inheritance

Looks like this

AccumulatorChatty <- R6Class("AccumulatorChatty",
  inherit = Accumulator,
  public = list(
    add = function(x = 1) {
      cat("Adding ", x, "\n", sep = "")
      super$add(x = x)  #super is alternative to 'NextMethod' in S3
    }
  )
)

Introspection

names(acc)
[1] ".__enclos_env__" "sum"             "clone"           "add"            

Other things

getters and setters -> yes it's a thing

public/private elements -> also a thing.

Reference Semantics

a1 = c(0)
a2 = a1
a2 = a2  + 10
y1 <- Accumulator$new()
y1$sum
[1] 0
y2 <- y1

y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)

Reference Semantics

a1 = c(0)
a2 = a1
a2 =a2  + 10
c( a1 = a1, a2 = a2 )
a1 a2 
 0 10 
y1 <- Accumulator$new()
y2 <- y1
y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
y1 y2 
10 10 

Threading State

How would you implement a stack in R?

stack = new_stack( c(1,2,3))

stack = push(stack, 4)  #this is ok
result = pop(stack)
result # 4
result = pop(stack)
result # 3
result = pop(stack)
result # 2

Stack

new_stack <- function(items = list()) {
  structure(list(items = items), class = "stack")
}

push <- function(x, y) {
  x$items <- c(x$items, list(y))
  x
}

pop <- function(x) {
  n <- length(x$items)

  item <- x$items[[n]]
  x$items <- x$items[-n]

  list(item = item, x = x) #UGLY!
}

Group Discussion

There is a package called 'progress' for rendering a progress bar. Why does it use R6?

pb <- progress_bar$new(
    format = "(:spin) [:bar] :percent",
    total = 30, clear = FALSE, width = 60)
for (i in 1:30) {
    pb$tick()
    Sys.sleep(40 / 100)
}

R6 Versus RC

RC is a 'reference class' system built into R. Hadley prefers R6 because it's simpler and the docs are better: https://r6.r-lib.org

Learn R6 @hadley says

Discussion: Are there patterns here?

R6 is used in:

  • testthat reporters
  • callr ( for external r sessions)
  • readr ( for post-reading callbacks when reading streams)
  • httpuv ( assorted streams; webservers)
  • progress ( for progress bars! )
  • Rshiny (uses R6 all over the place)

Discussion: What commonalities do these packages have?

S4

  • OO System
  • Based on S3 but more complicated
  • Implemented within the 'methods' package

S4 Advantages (Over S3)

Advantages

  • Multiple Dispatch
  • Object Validation
  • API vs internals

Disadvantages

  • Complicated ( S4 package has over 216 methods!)

Resources

Sample Exercise

lubridate::period() returns an S4 class. What slots does it have? What class is each slot? What accessors does it provide?

Results

library(lubridate)
p = lubridate::period(12)
p@.Data  # this is how you access a 'slot'
[1] 12
str(p)
Formal class 'Period' [package "lubridate"] with 6 slots
  ..@ .Data : num 12
  ..@ year  : num 0
  ..@ month : num 0
  ..@ day   : num 0
  ..@ hour  : num 0
  ..@ minute: num 0

Getting help for a class

class?Period
class?h2o::H2OModel

Sample Packages

Here are some packages that use S4

  1. DBI (database wrapper)
  2. RCpp (cpp wrapper)
  3. H20 ( h20-ai wrapper )
  4. bioconductor ( suite of genomics libraries )
  5. lubridate ( time parsing )
  6. colorspace
  7. Matrix

Notice any patterns? It's not obvious!

Sample Packages

  1. DBI
  2. RCpp
  3. H20 (also available in Java; Python, etc…)

What do these have in common?

S4 - Class Definition

They are mappings of things from more formal Object Oriented languages.

Here are a bunch of examples of how S4 works.

setClass("Person",
  slots = c(
    name = "character",
    age = "numeric"
  ))

Construct an object with “new”

john <- new("Person", name = "John Smith", age = NA_real_)

Slots

S4 classes have slots ( Similar to attributes )

Access a “slot” with the '@' symbol

john@name
[1] "John Smith"

hadley recommends creating getters and setters and only using '@' in internal methods.

Getters and Setters

First create generics

setGeneric("age", function(x) standardGeneric("age"))
[1] "age"
setGeneric("age<-", function(x, value) standardGeneric("age<-"))
[1] "age<-"

Then we can create getters and setters

setMethod("age", "Person", function(x) x@age)
setMethod("age<-", "Person", function(x, value) {
  x@age <- value
  x
})

age(john) <- 50
age(john)
[1] 50

Documentation

To print an S4 model:

show(john)
An object of class "Person"
Slot "name":
[1] "John Smith"

Slot "age":
[1] 50

To look up documentation:

Look up S4 class docs like this

Prototype

You can set a default value:

setClass("Person",
  slots = c(
    name = "character",
    age = "numeric"
  ),
  prototype = list(
    name = NA_character_,
    age = NA_real_
  )
)

me <- new("Person", name = "Hadley")
str(me)
Formal class 'Person' [package ".GlobalEnv"] with 2 slots
  ..@ name: chr "Hadley"
  ..@ age : num NA

Inheritance

setClass("Employee",
  contains = "Person",  #This is how it knows to inherit!
  slots = c(
    boss = "Person"
  ),
  prototype = list(
    boss = new("Person")
  )
)

BE CAREFUL DURING DEVELOPMENT

If you insantiate an object, and then redefine a class, it doesn't go well!

Public Constructors

hadley & Martin Morgan suggest using 'new' internally and exposing a separate constructor with nice warning messages:

Person <- function(name, age = NA) {
  age <- as.double(age)

  new("Person", name = name, age = age)
}

Validators

This is called automatically when you call 'new', and will throw an error if not TRUE.

setValidity("Person", function(object) {
  if (length(object@name) != length(object@age)) {
    "@name and @age must be same length"
  } else {
    TRUE
  }
})
try({
  new("Person", name = "Danimal", age = c(10, 23))
})
Error in validObject(.Object) : 
  invalid class "Person" object: @name and @age must be same length

Validator Example

You can call validObject at any time to check. It's only called automatically in the constructor

try({
alex <- Person("Alex", age = 30)
alex@age <- 1:10
validObject(alex)
})
Error in validObject(alex) : 
  invalid class "Person" object: @name and @age must be same length

Generics

Create a new generic like this

setGeneric("myGeneric", function(x) standardGeneric("myGeneric"))
[1] "myGeneric"

Instead of this:

myGeneric = function(x){ UseMethod("myGeneric")}

Methods

Arguments are : generic; 'signature' and function

setMethod("myGeneric", "Person", function(x) {
  # method implementation
})

For reference

methods("generic") #To find all methods for generic
methods(class ='class') #To find methods for class
selectMethod("generic", "class") # to see an implementation

Show

How to lookup an existing generic:

args(getGeneric("show"))
function (object) 
NULL

How to implement an existing generic:

setMethod("show", "Person", function(object) {
  cat(is(object)[[1]], "\n",
      "  Name: ", object@name, "\n",
      "  Age:  ", object@age, "\n",
      sep = ""
  )
})

Sample Packages

Here are some packages that use S4

Package explained
DBI OK
RCpp OK
H20 OK
bioconductor ?
lubridate ?
colorspace ?
Matrix ?

Bioconductor

  • Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.”“
  • commonly used domain specific generics (e.g. organism(), score(), strand() , genome(), seqinfo() )
  • 1700 packages

GROUP DISCUSSION

The lubridate package has periods, intervals, and durations.

  • Durations: the way physicists think about time ( it always goes up, one second at a time )

  • Periods: They way humans think about time. Times are measured in years, month and days. However! The number of seconds in a period might vary, based on when the period started ( thanks to leap years )

  • Interval: a number of seconds that starts at a specific date

  • Numeric: sometimes you just want to represent a number as a number of seconds

Why does lubridate use S4? (Or rather, what might be difficult in S3?)

Multiple Dispatch

Answer: lubridate wants to combine these classes in all kinds of ways. That's clunky in S3.

There is no easy way in S3 to change your method definition based on the types of two arguments!

`%within%`.Period= function( val ){

if ( class(val) == 'numeric' ){ .... }
if ( class(val) == 'Period' ){ .... }
if ( class(val) == 'POSIXct' ){ .... }
}

Answer

For multiple dispatch ( which is annoying in S3)

#' @export
setMethod("+", signature(e1 = "Period", e2 = "Period"),
          function(e1, e2) add_period_to_period(e2, e1))

#' @export
setMethod("+", signature(e1 = "Period", e2 = "Date"),
          function(e1, e2) add_period_to_date(e1, e2))

#' @export
setMethod("+", signature(e1 = "Period", e2 = "numeric"),
          function(e1, e2) add_number_to_period(e2, e1))

#' @export
setMethod("+", signature(e1 = "Period", e2 = "POSIXct"),
          function(e1, e2) add_period_to_date(e1, e2))

#' @export
setMethod("+", signature(e1 = "Period", e2 = "POSIXlt"),
          function(e1, e2) add_period_to_date(e1, e2))

#' @export
setMethod("+", signature(e1 = "Date", e2 = "Duration"),
          function(e1, e2) add_duration_to_date(e2, e1))

#' @export
setMethod("+", signature(e1 = "Date", e2 = "Period"),
          function(e1, e2) add_period_to_date(e2, e1))

#' @export
setMethod("+", signature(e1 = "difftime", e2 = "Duration")

Multiple Dispatch and Multiple Inheritance

Just because something is possible doesn't mean it is a good idea.

Why People Use S4

Either

  1. Multiple Dispatch is important ( lubridate; colorspace; Matrix )
  2. It's a large project with many developers ( which is maintained by Martin Morgan ) ( bioconductor)
  3. Someone is porting a package from an OO language into R ( DBI; RCpp; H20 )

Now you can use factor in S4 classes.

Common packages with s4

Package( classes )

  • Rcpp ( C++Class; C++Object; C++Function; Module,etc. )
  • colorspace ( color; RGB; HSV; polarLUV; etc. )
  • lubridate (Duration; interval; period; timespan )
  • DBI ( DBIConnection; DBIDriver; etc. )
  • Matrix ( complicated object hierarchy )

When to use

  1. Use S3 by default because it is simple and feels like R
  2. Use S4 for
    1. projects with many developers;
    2. thin wrappers of code from Java; python etc. (sometimes)
    3. fancy arithemetic for things like times, colors, etc. ( using multiple dispatch)
  3. Use R6 when you
    1. need reference semantics, for things like user interfaces or external sockets

Thanks and one more thing..

Don't you remember?

The more times you remember a thing, the less likely you are to forget it.

Become a beta tester! www.dontyouremember.com

Dont You Remember

Try out www.dontyouremember.com

  1. Install
  2. Use R
  3. Do Your Flashcards Daily
  4. Remember what you learned!