-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add re-submission of tasks during spot interruption disconnects #516
base: master
Are you sure you want to change the base?
Conversation
Can someone help to review this PR to see if its ok? It's actually identical to #485 but I opened a new PR so that it's eligible for hacktoberfest 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature looks very interesting, AFAICT this seems to do what it says but it is a hard to test feature.
yeah i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful! |
final boolean isUnexpectedDisconnection = computer.isOffline() && computer.getOfflineCause() | ||
instanceof OfflineCause.ChannelTermination; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the customers complained that OfflineCause.ChannelTermination
was not always triggered for spot interruption. You may be able to dig this further here: jenkinsci/ec2-fleet-plugin#121
This seems to have been approved in October 2020. Is this going to be merged soon? This would be really helpful for us |
Yes, this would be really awesome to add - any plans? |
Hello, we're also looking forward to this feature. |
It should be possible to test it now with this new-ish* AWS feature: |
We are looking forward to use this feature. When this is expected to be released? |
This PR adds a new feature - re-submission of tasks for agents that are disconnected due to spot interruption event in AWS.
Whenever an agent is disconnected, there are checks to determine if it is an unexpected disconnect and if the disconnection is a spot interruption event. If the answer is yes to both, the tasks that were running on the agent will be re-submitted to the queue.
Motivation
Builds may fail due to spot instances being terminated. This PR can help to reduce the number of build failures for spot interruption events.
Notes
This may or may not prevent build failures. There doesn't seem to be any documentation on how tasks can be resubmitted. This PR is inspired by another Jenkins plugin that has the suggested behaviour implemented - https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetAutoResubmitComputerLauncher.java