naive performance benchmarks #8

timelyportfolio · 2015-08-12T20:34:21Z

Thanks so much for listenv. It is a great idea, and I have enjoyed using already. I am in over my head, but I thought some naive performance benchmarks would be interesting, since environment usually offers a big performance boost (see http://jeffreyhorner.tumblr.com/post/117059271933/hash-table-performance-in-r-part-iv)

library(microbenchmark)
library(listenv)

n <- 2e4

env_env <- function(){
  env_env <- new.env()
  for(x in seq.int(1,n)){
    env_env[[as.character(x)]] <- runif(1)
  }  
  env_env
}

list_env <- function(){
  list_env <- listenv()
  for(x in seq.int(1,n)){
    list_env[[x]] <- runif(1)
  }
  list_env
}
list_list <- function(){
  list_list <- list()
  for(x in seq.int(1,n)){
    list_list[[x]] <- runif(1)
  }
  list_list
}

mb <- microbenchmark(
  env_env()
  ,list_env()
  ,list_list()
  ,times=10
)

autoplot(mb)

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2015-08-14T12:46:30Z

Thanks for this benchmarking feedback.

First of all, none of the code has been optimized for speed. Having said this, I always try to be careful not to waste cycles. It's not obvious to me how to speed it up, but there are a few method dispatching usages of map() and map()<- and some internal calls to assign_by_index() via dispatching that may add unnecessary overhead.

HenrikBengtsson · 2015-08-14T13:02:41Z

Also, when assigning by index, there is a need to create a new dummy/auxillary variable internal of the list environment. This is done on-the-fly using tempvar() which also may add unnecessary overhead. One idea to speed that one up for list environment is to create a chunk/pool of dummy variables once in a while that one pull from until empty (when a new pool is created). However, there's still a need to assert that there's no name clash.

timelyportfolio · 2015-08-14T13:05:43Z

Great, thanks for the feedback. I really just wanted to pass along the idea. I'll try to play with it in lineprofiler later today.

HenrikBengtsson · 2015-10-17T18:42:36Z

About list environments:
From some very basic lineprof run, it turns out a bunch of the time is spent in tempvar(). Moreover, in that function base::exists() is actually what takes up most of the time. tempvar() is currently very conservative, i.e. it really assumes nothing about what exists in the list environment when it tries to generate a new random variable name, i.e. it generates a random variable name and checks (via exists()) if it already happens to exists. It is written such that it allows users to assign variables directly via assign(), although that is not a common use case for list environments. If one would remove that protection, we could probably generate new unique variable names without having to assert uniqueness via exists().

About lists (not list environments): If you know the length upfront, it's significantly faster to pre-allocate a list, i.e. x <- vector("list", length=n) or x <- list(); length(x) <- n. I suspect R is not very smart in how it grows list - maybe it's allocates a new list of length(x)+1 and copies all of the elements over. If so, growing the list element by element will be very expensive.

… exported [#8]. map() is used by future package, so still needs to be exported.

HenrikBengtsson added the enhancement label Aug 14, 2015

HenrikBengtsson added a commit that referenced this issue Oct 17, 2015

Internal assign_by_index() etc are now plain functions [#8]

cbbedd6

HenrikBengtsson added a commit that referenced this issue Oct 17, 2015

Now map() accessors no longer uses inherits=TRUE [#8]

b32ecb8

HenrikBengtsson added a commit that referenced this issue Oct 17, 2015

CLEANUP: map() and map()<- are now plain functions + latter no longer…

11cf21f

… exported [#8]. map() is used by future package, so still needs to be exported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

naive performance benchmarks #8

naive performance benchmarks #8

timelyportfolio commented Aug 12, 2015

HenrikBengtsson commented Aug 14, 2015

HenrikBengtsson commented Aug 14, 2015

timelyportfolio commented Aug 14, 2015

HenrikBengtsson commented Oct 17, 2015

naive performance benchmarks #8

naive performance benchmarks #8

Comments

timelyportfolio commented Aug 12, 2015

HenrikBengtsson commented Aug 14, 2015

HenrikBengtsson commented Aug 14, 2015

timelyportfolio commented Aug 14, 2015

HenrikBengtsson commented Oct 17, 2015