Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future() and resolved() handle FutureError:s differently for different backends #696

Open
HenrikBengtsson opened this issue Aug 8, 2023 · 0 comments

Comments

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Aug 8, 2023

Issue

Future orchestration errors (i.e. FutureError) occurring when calling future() and resolve() are handled differently depending on future backend. Below are a few examples.

multicore

library(future)
plan(multicore, workers = 2)

segfault <- function(ii) {
  if (ii == 2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
  ii
}

fs <- lapply(1:4, FUN = function(ii) {
  message(sprintf("Launching future #%d", ii))
  future({ segfault(ii) })
})
#> Launching future #1
#> Launching future #2
#> Launching future #3
#> Launching future #4
#> Warning message:
#> In mccollect(jobs = jobs, wait = TRUE) :
#>   1 parallel job did not deliver a result
#> Calls: lapply ... value.Future -> result -> result.MulticoreFuture -> mccollect

resolved(fs)
#> [1] TRUE TRUE TRUE TRUE

fs <- resolve(fs)

rs <- lapply(fs, FUN = result)
#> Error: Failed to retrieve the result of MulticoreFuture (<none>) from the
#> forked worker (on localhost; PID 687068). Post-mortem diagnostic: No
#> process exists with this PID, i.e. the forked localhost worker is no longer
#> alive. The total size of the 2 globals exported is 6.52 KiB. There are two
#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of
#> class 'numeric')

vs <- value(fs)
#> Error: Failed to retrieve the result of MulticoreFuture (<none>) from the
#> forked worker (on localhost; PID 687068). Post-mortem diagnostic: No
#> process exists with this PID, i.e. the forked localhost worker is no longer
#> alive. The total size of the 2 globals exported is 6.52 KiB. There are two
#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of
#> class 'numeric')

multisession

library(future)
plan(multisession, workers = 2)

segfault <- function(ii) {
  if (ii == 2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
  ii
}

fs <- lapply(1:4, FUN = function(ii) {
  message(sprintf("Launching future #%d", ii))
  future({ segfault(ii) })
})
#> Launching future #1
#> Launching future #2
#> Launching future #3
#> Error in unserialize(node$con) : 
#>   MultisessionFuture (<none>) failed to receive message results from
#> cluster RichSOCKnode #2 (PID 687611 on localhost 'localhost'). The reason
#> reported was 'error reading from connection'. Post-mortem diagnostic: No
#> process exists with this PID, i.e. the localhost worker is no longer alive.
#> The total size of the 2 globals exported is 6.52 KiB. There are two
#> globals: 'segfault' (6.47 KiB of class 'function') and 'ii' (56 bytes of
#> class 'numeric')

future.callr::callr

library(future)
plan(future.callr::callr, workers = 2)

segfault <- function(ii) {
  if (ii == 2) tools::pskill(Sys.getpid()) else Sys.sleep(1)
  ii
}

fs <- lapply(1:4, FUN = function(ii) {
  message(sprintf("Launching future #%d", ii))
  future({ segfault(ii) })
})
#> Launching future #1
#> Launching future #2
#> Launching future #3
#> Launching future #4

> resolved(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr
#> subprocess failed: could not start R, exited with non-zero status, has
#> crashed or was killed'. Post-mortem diagnostic: The parallel worker
#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit
#> code -15. The total size of the 2 globals exported is 6.52 KiB. There
#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'
#> (56 bytes of class 'numeric')

fs <- resolve(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr
#> subprocess failed: could not start R, exited with non-zero status, has
#> crashed or was killed'. Post-mortem diagnostic: The parallel worker
#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit
#> code -15. The total size of the 2 globals exported is 6.52 KiB. There
#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'
#> (56 bytes of class 'numeric')


> rs <- lapply(fs, FUN = result)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr
#> subprocess failed: could not start R, exited with non-zero status, has
#> crashed or was killed'. Post-mortem diagnostic: The parallel worker
#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit
#> code -15. The total size of the 2 globals exported is 6.52 KiB. There
#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'
#> (56 bytes of class 'numeric')

> vs <- value(fs)
#> Error: CallrFuture (<none>) failed. The reason reported was '! callr
#> subprocess failed: could not start R, exited with non-zero status, has
#> crashed or was killed'. Post-mortem diagnostic: The parallel worker
#> (PID 686807) started at 2023-08-08T09:08:46+0000 finished with exit
#> code -15. The total size of the 2 globals exported is 6.52 KiB. There
#> are two globals: 'segfault' (6.47 KiB of class 'function') and 'ii'
#> (56 bytes of class 'numeric')

Suggestion

Harmonize the behavior. This is related to releasing future slots for failed futures.

See also

This is related to futureverse/future.callr#11.

@HenrikBengtsson HenrikBengtsson changed the title future() and resolved() handles FutureError:s differently for different backends future() and resolved() handle FutureError:s differently for different backends Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant