Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support fail fast for node removal #998

Closed
wants to merge 5 commits into from

Conversation

car-roll
Copy link
Contributor

@car-roll car-roll commented Oct 28, 2024

jenkinsci/workflow-durable-task-step-plugin#372 added fail fast for cloud node removal. However, this feature is not supported by the ec2 plugin.

This PR refactors EC2AbstractSlave and EC2Computer in order to take advantage of jenkinsci/workflow-durable-task-step-plugin#372

Testing done

  • Perform manual testing by killing spot instance manually and watching startup

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@car-roll car-roll changed the title Use AbstractCloudSlave and AbstractCloudComputer to support FailFast support fail fast for node removal Oct 28, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one is mystifying me. I'm still trying to figure out why

@@ -526,15 +523,23 @@ public boolean isAcceptingTasks() {
void idleTimeout() {
LOGGER.info("EC2 instance idle time expired: " + getInstanceId());
if (!stopOnTerminate) {
terminate();
try {
terminate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ignore WS)

@jglick
Copy link
Member

jglick commented Oct 29, 2024

Looks right, but has it been tested?

@car-roll
Copy link
Contributor Author

Cannot read field "terminateScheduled" because "node" is null
Comparing the test logs from master and this PR, everything looks similar up until the very end. Both versions show the ec2 instance idle timeout and terminated/removed, but then this version has this error that the node is null. I guess something funky is going on with this version's terminate. Terminated too soon maybe?

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #998 (comment) I guess that

EC2AbstractSlave node = c.getNode();
assertTrue(node.terminateScheduled.await(10, TimeUnit.SECONDS));
was not right to begin with (Computer.getNode may return null if the agent definition has been removed from the system before the executor is released and the computer destroyed) but happened to pass before due to different timing conditions. It is not exactly clear to me what the test is supposed to be asserting here. I guess you could just skip the last assertion in case node == null since this would be normal if the build is finishing.

src/main/java/hudson/plugins/ec2/EC2OndemandSlave.java Outdated Show resolved Hide resolved
src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated Show resolved Hide resolved
src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated Show resolved Hide resolved
src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated Show resolved Hide resolved
@res0nance res0nance added the enhancement Feature additions or enhancements label Nov 4, 2024
@car-roll
Copy link
Contributor Author

car-roll commented Dec 2, 2024

closing in favor of #1015 due to the branch conflicts

@car-roll car-roll closed this Dec 2, 2024
@jglick
Copy link
Member

jglick commented Dec 3, 2024

For the future, please retain the same PR and just resolve conflicts in it. In the worst case, if running a merge tool became completely unwieldy and it was really easiest to start from scratch, just reset your branch to origin/master, reapply changes, and force push. At least the context of previous reviews is retained.

@car-roll
Copy link
Contributor Author

car-roll commented Dec 4, 2024

@jglick yeah I was having a terrible time with the merge conflicts but I didn't think about force pushing. Will do that next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature additions or enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants