The effective number of parallel jobs decreases during the computation when the total number of jobs is larger than number of cores #644
-
Hello, I am fitting n= 1000 Stan models with 64 cores using doFuture package. Below is my code,
So each model fitting will use 4 cores/threads in parallel, with 128 threads I should have 32 models fitting at the same time which is true at the beginning of the computation, all 64 cores (128 threads) were indeed used. I am expecting if one fit is done (4 threads become available), another model fitting will start and all CPUs should have been used all the time until the last few models. Then the # active CPUs will decrease until finishing. However, it seems the TOTAL # of working CPUs was decreasing little by little, around the middle of the computation, there were just ~ 32 cores working, this could decrease until only 4 cores are working and the rest models will be computed one after the other which increased the total computation time a lot. In other words, the effective # of parallel jobs were decreasing little by little until 4 (parallel_chains = 4). Could you please help me to fix this problem? I guess it is because of the parallel_chains = 4 in one model fit, but don't know exactly the reason. Thank you very much. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hello. If I understand your description of the problem correctly, it sounds like a so-called "load balancing" issue, where you basically have parallel workers sitting idle at the end. The default behavior for doFuture, but also for siblings future.apply and furrr, is to take all N iterations and chunk them up into W equally-sized portions, where W = number of workers. Each worker gets to process one chunk. In your case, with N=1000 and W=64, each worker processes on 15-16 Stan models. It sounds like some chunks finish much sooner than the others, so at the end there are only a few workers actually using the CPU. There are ways to configure what chunking strategy to use. See Section 'Load balancing ("chunking")' in https://dofuture.futureverse.org/reference/doFuture.html. The default argument value is So, try that and see if it helps |
Beta Was this translation helpful? Give feedback.
Hello. If I understand your description of the problem correctly, it sounds like a so-called "load balancing" issue, where you basically have parallel workers sitting idle at the end.
The default behavior for doFuture, but also for siblings future.apply and furrr, is to take all N iterations and chunk them up into W equally-sized portions, where W = number of workers. Each worker gets to process one chunk. In your case, with N=1000 and W=64, each worker processes on 15-16 Stan models. It sounds like some chunks finish much sooner than the others, so at the end there are only a few workers actually using the CPU.
There are ways to configure what chunking strategy to use. See Section 'Load …