Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead #2

Open
wlandau opened this issue Feb 13, 2018 · 2 comments
Open

Overhead #2

wlandau opened this issue Feb 13, 2018 · 2 comments

Comments

@wlandau
Copy link

wlandau commented Feb 13, 2018

I am really happy to see future extend to so many forms of parallelism!

Is there anything we can do about overhead?

library(future.callr)
library(microbenchmark)
library(parallel)

n <- rep(1e2, 10)
f <- function(n){
  mean(rnorm(n))
}

options(mc.cores = 10)
x <- microbenchmark(mclapply(n, f))
plan(future.callr::callr)
y <- microbenchmark(future_lapply(n, f))
rbind(x, y)

## Unit: milliseconds
## expr       min          lq       mean     median         uq
## mclapply(n, f)    8.9127    9.696006   10.41556   10.39014   10.92663
## future_lapply(n, f) 2982.5224 3075.635802 3140.52041 3090.21414 3101.86924
## max neval cld
## 13.886   100  a
## 6304.660   100   b
@HenrikBengtsson
Copy link
Collaborator

I think the fair comparison would be future::multisession (aka parallel's SOCK clusters) and maybe future.batchtools::batchtools_local (write to file and launch). mclapply() (aka FORKed processes) is kind of different; with it's own pros and cons.

So I haven't done any profiling what so ever; if there is a big difference between plan(callr) and plan(multisession), I'd expect the overhead comes from the callr framework itself.

Also, the obvious comment is that, when the overhead is greater it would require bigger computational tasks before it's worth parallelizing.

@king-of-poppk
Copy link

Would it make sense to start callr::r_session's in the background so that at least we do not have to wait for process startup after an idleness period? Maybe workers would set the maximum of parallel run sessions, and minWorkersIdle or similar would be the minimum of process to setup in advance to avoid initialization latency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants