Fix tests hanging when a worker dies unexpectedly due to an unrecoverable error such as a segfault #81
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes flatware hanging indefinitely when a worker dues unexpectedly due to an unrecoverable error such as a segmentation fault.
Context
When a worker dies unexpectedly, currently flatware has no
CHLD
signal handling in place to catch the death of that process and recover. This can happen if your spec happens to trigger a segmentation fault. In our specs, this happens rarely and randomly, but when it does -- it causes our CI to sit there waiting forever.Solution
The proposed solution is to introduce a class that's responsible for "managing" workers. Specifically, this class will listen for
CHLD
signals and re-spawn a worker if the process was killed unexpectedly.We can define "unexpected" based on whether the sink proactively deleted the worker when returning
Job.sentinel
.Additionally, when the worker is re-spawned, we re-assign it the same work that it was previously assigned.
Alternative Solutions
Possible alternative solutions:
Open Questions
TODO
Assuming moving forward with the approach:
Looking for any feedback on the high-level approach before updating the specs. Thanks!