You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Future orchestration errors (i.e. FutureError) occurring when calling future() and resolve() are handled differently depending on future backend. Below are a few examples.
multicore
library(future)
plan(multicore, workers=2)
segfault<-function(ii) {
if (ii==2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
ii
}
fs<- lapply(1:4, FUN=function(ii) {
message(sprintf("Launching future #%d", ii))
future({ segfault(ii) })
})
#> Launching future #1#> Launching future #2#> Launching future #3#> Launching future #4#> Warning message:#> In mccollect(jobs = jobs, wait = TRUE) :#> 1 parallel job did not deliver a result#> Calls: lapply ... value.Future -> result -> result.MulticoreFuture -> mccollect
resolved(fs)
#> [1] TRUE TRUE TRUE TRUEfs<- resolve(fs)
rs<- lapply(fs, FUN=result)
#> Error: Failed to retrieve the result of MulticoreFuture (<none>) from the#> forked worker (on localhost; PID 687068). Post-mortem diagnostic: No#> process exists with this PID, i.e. the forked localhost worker is no longer#> alive. The total size of the 2 globals exported is 6.52 KiB. There are two#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of#> class 'numeric')vs<- value(fs)
#> Error: Failed to retrieve the result of MulticoreFuture (<none>) from the#> forked worker (on localhost; PID 687068). Post-mortem diagnostic: No#> process exists with this PID, i.e. the forked localhost worker is no longer#> alive. The total size of the 2 globals exported is 6.52 KiB. There are two#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of#> class 'numeric')
multisession
library(future)
plan(multisession, workers=2)
segfault<-function(ii) {
if (ii==2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
ii
}
fs<- lapply(1:4, FUN=function(ii) {
message(sprintf("Launching future #%d", ii))
future({ segfault(ii) })
})
#> Launching future #1#> Launching future #2#> Launching future #3#> Error in unserialize(node$con) : #> MultisessionFuture (<none>) failed to receive message results from#> cluster RichSOCKnode #2 (PID 687611 on localhost 'localhost'). The reason#> reported was 'error reading from connection'. Post-mortem diagnostic: No#> process exists with this PID, i.e. the localhost worker is no longer alive.#> The total size of the 2 globals exported is 6.52 KiB. There are two#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of#> class 'numeric')
future.callr::callr
library(future)
plan(future.callr::callr, workers=2)
segfault<-function(ii) {
if (ii==2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
ii
}
fs<- lapply(1:4, FUN=function(ii) {
message(sprintf("Launching future #%d", ii))
future({ segfault(ii) })
})
#> Launching future #1#> Launching future #2#> Launching future #3#> Launching future #4> resolved(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr#> subprocess failed: could not start R, exited with non-zero status, has#> crashed or was killed'. Post-mortem diagnostic: The parallel worker#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit#> code -15. The total size of the 2 globals exported is 6.52 KiB. There#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'#> (56 bytes of class 'numeric')fs<- resolve(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr#> subprocess failed: could not start R, exited with non-zero status, has#> crashed or was killed'. Post-mortem diagnostic: The parallel worker#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit#> code -15. The total size of the 2 globals exported is 6.52 KiB. There#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'#> (56 bytes of class 'numeric')>rs<- lapply(fs, FUN=result)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr#> subprocess failed: could not start R, exited with non-zero status, has#> crashed or was killed'. Post-mortem diagnostic: The parallel worker#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit#> code -15. The total size of the 2 globals exported is 6.52 KiB. There#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'#> (56 bytes of class 'numeric')>vs<- value(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr#> subprocess failed: could not start R, exited with non-zero status, has#> crashed or was killed'. Post-mortem diagnostic: The parallel worker#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit#> code -15. The total size of the 2 globals exported is 6.52 KiB. There#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'#> (56 bytes of class 'numeric')
Suggestion
Harmonize the behavior. This is related to releasing future slots for failed futures.
future() should attempt to release future slots, if possible. If there are available slots, it should use those and only produce a FutureError if no more slots are available.
HenrikBengtsson
changed the title
future() and resolved() handles FutureError:s differently for different backends
future() and resolved() handle FutureError:s differently for different backends
Aug 9, 2023
Issue
Future orchestration errors (i.e.
FutureError
) occurring when callingfuture()
andresolve()
are handled differently depending on future backend. Below are a few examples.multicore
multisession
future.callr::callr
Suggestion
Harmonize the behavior. This is related to releasing future slots for failed futures.
future()
should attempt to release future slots, if possible. If there are available slots, it should use those and only produce a FutureError if no more slots are available.resolved()
should never give an error.may throw aFutureError
per https://future.futureverse.org/articles/future-6-future-api-backend-specification.html#resolved-method.See also
This is related to futureverse/future.callr#11.
The text was updated successfully, but these errors were encountered: