Failed to retrieve the result of MulticoreFuture (<none>) from the forked worker (on localhost; PID 62510). Post-mortem diagnostic: No process exists with this PID, i.e. the forked localhost worker is no longer alive. #498
Replies: 17 comments
-
What's Does it happen also when you use
Unfortunately not; that's a frequently requested feature that's on the roadmap but it involves lots of work and several things need to be in place first, so not anything soon. |
Beta Was this translation helpful? Give feedback.
-
fit.fxn <- (df, formula, control) {
res <- drm(formula,
data = df,
fct = LL2.4(names = c("Slope", "Lower", "Upper", "EC50")),
control = control
)
} Are you suggesting I use But I am able to reproduce this by knocking out child processes that were started using
So I'm not convinced that dropping in |
Beta Was this translation helpful? Give feedback.
-
For troubleshooting purposes. Something is causing one or more of your parallel workers to die - not just error but crash so that the R process terminates. That can happen for several reasons. Since You can also set options(future.globals.onReference = "error") to (99%) rule out that you're not using any objects that cannot be used in parallelization (https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html). If you get an error from running with the above, that's another good clue, especially since forked processing sometimes misleads us to believe it works (but it's actually very unstable). It could also be that you're running out of memory on the workers causing them to crash. I'm almost 100% certain the underlying problem is unrelated to the future package per se. If you replace your Also, there was a bug in R's parallel that was just fixed that possibly could also explain this problem. (*) R Core & mclapply author Simon Urbanek wrote on R-devel (April 2020): “Do NOT use mcparallel() in packages except as a non-default option that user can set ... Multicore is intended for HPC applications that need to use many cores for computing-heavy jobs, but it does not play well with RStudio and more importantly you [as the developer] don't know the resource available so only the user can tell you when it's safe to use.” |
Beta Was this translation helpful? Give feedback.
-
Hmmm this is helpful -- the difficult thing is that I can't reproduce consistently just by running my code, so it's difficult to tell if just dropping in
I set this and I do see errors -- are you suggesting that those objects are causing the error? I don't have control over most of the code, since I'm calling out to an external package, so I'm not sure if I would be able to remove all of those errors. Is the idea that
This is possible... I'm a bit skeptical that this is the issue since all workers should have the same workloads and I'd like to think this error would show up more consistently.
This is super interesting! That is the exact error, though I'm reasonably sure the package I'm using isn't using
|
Beta Was this translation helpful? Give feedback.
-
I think that's a very strong clue. Exactlty what does the error say? The error message may provide clues on what type of object is involved or from which package it originates.
Note that you get that error message for any type of problem that causes your multicore worker to die - so there can still be many different for why it's dying.
It can also be in |
Beta Was this translation helpful? Give feedback.
-
Here's the error I'm getting:
That's not immediately clear to me where I should be looking -- would love some advice on how I can narrow down the issue.
I don't understand this -- shouldn't both |
Beta Was this translation helpful? Give feedback.
-
Me neither; I was hoping it would mention a variable name as lead, e.g. "... one of the globals ('var_a') ...".
Yes, they do, with the exception that 'multisession' does not fork, it spawns a process that runs in the background. It's only 'multicore' that relies on "forking", which is a very special concept in operating systems/parallel processing. It's important to distinguish forking from all other types of parallel processing. The core problem with forking is that you cannot run everything inside a forked process. So, it can very well be that |
Beta Was this translation helpful? Give feedback.
-
Oh, BTW, and importantly, the error |
Beta Was this translation helpful? Give feedback.
-
Thanks so much for your help! I've was finally able to isolate a test case that failed consistently. The One thing I've noticed is that |
Beta Was this translation helpful? Give feedback.
-
That's great. Then to 100% rule out it's related to the future framework, you could replace your: fit = furrr::future_map(data, fit.fxn, formula = effect ~ bar) call with the following counter part: fit = parallel::mclapply(data, fit.fxn, formula = effect ~ bar) If that also crashes in your test case, then we can be pretty certain it has to do with forking.
Yes, forking is faster since it's more lightweight with less overhead and also implemented by the OS itself. Unfortunately, there's nothing magic we can do to lower the overhead of the other parallelization backends down the same level. BTW, if your test case is small and can be shared, please consider doing so. It might help someone else and it might even be that someone will see it and fix it. |
Beta Was this translation helpful? Give feedback.
-
Yeah, let me try to get a reprex over this weekend and also try the |
Beta Was this translation helpful? Give feedback.
-
@nlarusstone I'm really interested to see what you did to replicate the problem. I have a shiny app using
And seems to be related to #226. I'm thinking of just setting the plan to |
Beta Was this translation helpful? Give feedback.
-
@nlarusstone, install: remotes::install_github("HenrikBengtsson/future", ref = "09f9b7d") then retry with: options(future.globals.onReference = "error") I'm quite certain that the error on:
will go away. If it does, you can rule out that external pointers are the problem. (The above version fixes a bug where the future framework would think that there are external pointers when there aren't). |
Beta Was this translation helpful? Give feedback.
-
@tyluRp unfortunately I don't have any good advice for you. We happened to get lucky where a specific set of data consistently reproduced this error. I'm working on a minimal reprex, but it's difficult to get the error to reproduce. |
Beta Was this translation helpful? Give feedback.
-
@HenrikBengtsson I installed that version and you're right, that error went away. Still not quite sure what the exact error was...but seems like |
Beta Was this translation helpful? Give feedback.
-
Awesome. So that rules out one thing for you; that previous error on external pointers (https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html) was a false alert and that was not the cause of your original problem. |
Beta Was this translation helpful? Give feedback.
-
And if the problem comes back if you go back to |
Beta Was this translation helpful? Give feedback.
-
I'm getting the following error:
Failed to retrieve the result of MulticoreFuture (<none>) from the forked worker (on localhost; PID 62510). Post-mortem diagnostic: No process exists with this PID, i.e. the forked localhost worker is no longer alive.
This error occurs intermittently in the normal course of running our scripts.
It's difficult for me to provide a reprex, as the only way for me to consistently reproduce this is by running my code and sending
kill -9
signals to child processes that are spawned.Here's the code that produces that error:
Is there a way for me to ensure that if a child process dies it gets restarted?
Here's my session info:
Beta Was this translation helpful? Give feedback.
All reactions