Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming old jobs #2

Open
alexanderwerning opened this issue Jul 26, 2024 · 0 comments
Open

Resuming old jobs #2

alexanderwerning opened this issue Jul 26, 2024 · 0 comments

Comments

@alexanderwerning
Copy link

I ran autoexperiment, but my jobs crashed due to an error, so I stopped the autoexperiment process. I tried to restart them using autoexperiment on the next day, but I got these messages for each one:

Resume <exp> from job id: <job id>                                                                                                                                                                    
Current job id for <exp>: <job id>                                                                                                                                                                    
slurm_load_jobs error: Invalid job id specified                                    
Command 'squeue -j <job id>' returned non-zero exit status 1.                                                      
Retrying again in 10 mins for <exp>...  

It seems that the job ids are no longer tracked by slurm in squeue. Should I always manually delete all the logs/ files?
Is this behaviour intended?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant