-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish hyperuqeue jobs instead of cancelling in load balancer #51
Comments
Hi, I also experience this, and I found it confusing as I thought my evaluations were failing! It looks like:
despite the evaluation being successful. |
Hi @jonmaddock , this is, as it stands, expected. We don't have a cleaner way of shutting down a UM-Bridge server inside a HQ job after model evaluation. I do however see that it's not ideal since the cancellation looks like an error. Maybe we could make servers optionally (via env. variable) accept a termination signal from client side. We discussed such a signal before, and it should not be default behaviour, but opt-in via env. variable set in the job scripts would be acceptable I think... @annereinarz @chun9l Do you have any opinions on that? @monabraeunig maybe an interesting next task for you after error handling? |
@linusseelinger thanks for your response. I completely understand you reasoning; the behaviour was confusing for a newcomer, that's all. Perhaps I could add this current behaviour to the documentation for HPC if we don't proceed with this fix? |
@jonmaddock that'd be great! Let's first see if we can come up with a clean solution regarding termination though, cancelleld jobs have already surprised other users before |
No description provided.
The text was updated successfully, but these errors were encountered: