You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think we see this all that often, but if I am understanding this correctly, it's managing the job state on a timer independent of the other things managing job state.
Realistically we don't see this very often (I see it once in the last five days) and the consequence isn't that bad. But it confuses the separation of concerns for how a job gets moved to stopped.
We should find the "normal" way of how an execution gets moved to stopped to double check my logic here.
The text was updated successfully, but these errors were encountered:
If reapExecutions() is removed then we might eventually see a job stuck in stopping. This would be a clue that a pod did not shutdown properly and the job needs to be force stopped to clean up the resources.
I'm guessing that the majority of the time this is used the pod shuts down right after we change the status to stopped.
I think the way this
reapExecutions
is used ends up being in a race with the Kubernetes pod shutdown timeout:teraslice/packages/teraslice/src/lib/cluster/services/execution.ts
Line 515 in f1bd147
I don't think we see this all that often, but if I am understanding this correctly, it's managing the job state on a timer independent of the other things managing job state.
Realistically we don't see this very often (I see it once in the last five days) and the consequence isn't that bad. But it confuses the separation of concerns for how a job gets moved to
stopped
.We should find the "normal" way of how an execution gets moved to
stopped
to double check my logic here.The text was updated successfully, but these errors were encountered: