You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RUNNING HANDLER [stackhpc.openhpc : Restart slurmd service] *******************************************************************
skipping: [cclr-dev-control]
fatal: [cclr-dev- @compute-1]: FAILED! => {
"changed": false
}
MSG:
Unable to start service slurmd: Job for slurmd.service failed because the control process exited with error code.
See "systemctl status slurmd.service" and "journalctl -xe" for details.
fatal: [cclr-dev-login-0]: FAILED! => {
"changed": false
}
from the journal:
9-25 16:19:33 UTC. --
tarting Slurm node daemon...
slurmd: error: _fetch_child: failed to fetch remote configs: Unexpected message received
error: _fetch_child: failed to fetch remote configs: Unexpected message received
slurmd: error: _establish_configuration: failed to load configs
slurmd: error: slurmd initialization failed
error: _establish_configuration: failed to load configs
error: slurmd initialization failed
lurmd.service: Main process exited, code=exited, status=1/FAILURE
lurmd.service: Failed with result 'exit-code'.
ailed to start Slurm node daemon.
Restarting slurmd fixes it. Is this a race on startup? Can we get it to retry?
The text was updated successfully, but these errors were encountered:
from the journal:
Restarting slurmd fixes it. Is this a race on startup? Can we get it to retry?
The text was updated successfully, but these errors were encountered: