Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naive performance benchmarks #8

Open
timelyportfolio opened this issue Aug 12, 2015 · 4 comments
Open

naive performance benchmarks #8

timelyportfolio opened this issue Aug 12, 2015 · 4 comments

Comments

@timelyportfolio
Copy link

Thanks so much for listenv. It is a great idea, and I have enjoyed using already. I am in over my head, but I thought some naive performance benchmarks would be interesting, since environment usually offers a big performance boost (see http://jeffreyhorner.tumblr.com/post/117059271933/hash-table-performance-in-r-part-iv)

library(microbenchmark)
library(listenv)

n <- 2e4

env_env <- function(){
  env_env <- new.env()
  for(x in seq.int(1,n)){
    env_env[[as.character(x)]] <- runif(1)
  }  
  env_env
}

list_env <- function(){
  list_env <- listenv()
  for(x in seq.int(1,n)){
    list_env[[x]] <- runif(1)
  }
  list_env
}
list_list <- function(){
  list_list <- list()
  for(x in seq.int(1,n)){
    list_list[[x]] <- runif(1)
  }
  list_list
}

mb <- microbenchmark(
  env_env()
  ,list_env()
  ,list_list()
  ,times=10
)

autoplot(mb)

image

@HenrikBengtsson
Copy link
Collaborator

Thanks for this benchmarking feedback.

First of all, none of the code has been optimized for speed. Having said this, I always try to be careful not to waste cycles. It's not obvious to me how to speed it up, but there are a few method dispatching usages of map() and map()<- and some internal calls to assign_by_index() via dispatching that may add unnecessary overhead.

@HenrikBengtsson
Copy link
Collaborator

Also, when assigning by index, there is a need to create a new dummy/auxillary variable internal of the list environment. This is done on-the-fly using tempvar() which also may add unnecessary overhead. One idea to speed that one up for list environment is to create a chunk/pool of dummy variables once in a while that one pull from until empty (when a new pool is created). However, there's still a need to assert that there's no name clash.

@timelyportfolio
Copy link
Author

Great, thanks for the feedback. I really just wanted to pass along the idea. I'll try to play with it in lineprofiler later today.

@HenrikBengtsson
Copy link
Collaborator

About list environments:
From some very basic lineprof run, it turns out a bunch of the time is spent in tempvar(). Moreover, in that function base::exists() is actually what takes up most of the time. tempvar() is currently very conservative, i.e. it really assumes nothing about what exists in the list environment when it tries to generate a new random variable name, i.e. it generates a random variable name and checks (via exists()) if it already happens to exists. It is written such that it allows users to assign variables directly via assign(), although that is not a common use case for list environments. If one would remove that protection, we could probably generate new unique variable names without having to assert uniqueness via exists().

About lists (not list environments): If you know the length upfront, it's significantly faster to pre-allocate a list, i.e. x <- vector("list", length=n) or x <- list(); length(x) <- n. I suspect R is not very smart in how it grows list - maybe it's allocates a new list of length(x)+1 and copies all of the elements over. If so, growing the list element by element will be very expensive.

HenrikBengtsson added a commit that referenced this issue Oct 17, 2015
… exported [#8]. map() is used by future package, so still needs to be exported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants