Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to honour the docker autostart delay? #31

Open
lonzodelorana opened this issue Jul 9, 2024 · 14 comments
Open

Is it possible to honour the docker autostart delay? #31

lonzodelorana opened this issue Jul 9, 2024 · 14 comments

Comments

@lonzodelorana
Copy link

Updating docker containers will stop/update/start the container immediately and then proceed to stop/update/start the next, however on older or low power hardware this can end up with the containers updating but failing to start.

The Unraid docker implementation has an autostart delay built in, used to slow down starting of containers after a reboot for just this issue, is it possible to honour this delay before starting the next update procedure, so containers can update without manual intervention on this sort of setup please?

@Commifreak
Copy link
Owner

on older or low power hardware this can end up with the containers updating but failing to start.

How? These actions happens in order. I dont know why this should fail - and with what error message?
The container delay is being respected at container start - but AFTER it started.

If there are race conditions, it must be an issue with docker itself. Please provide a error message (syslog?)

I could add a static 3 second wait after internal docker update. Should hurt nobody.

@lonzodelorana
Copy link
Author

lonzodelorana commented Jul 9, 2024

Actually, it looks like I made a huge assumption over why the containers were not starting and it looks like the docker autostart delays are indeed honoured by appdata.backup. My sincere apologies.

I believe what is happening is a container which is being used as the networking for other containers is being updated and it looks like as the dependant containers are recreated due to the networking change, appdata.backup can't find the containers to start them, log snippet below.

[24.06.2024 03:02:55][ℹ️][network-container] Should NOT backup external volumes, sanitizing them...
[24.06.2024 03:02:55][ℹ️][network-container] Calculated volumes to back up: /mnt/user/appdata/network-container
[24.06.2024 03:02:55][ℹ️][network-container] Backing up network-container...
[24.06.2024 03:02:55][ℹ️][network-container] Backup created without issues
[24.06.2024 03:02:55][ℹ️][network-container] Verifying backup...
[24.06.2024 03:02:55][ℹ️][network-container] Installing planned update for network-container...
[24.06.2024 03:03:04][ℹ️][Main] Set containers to previous state
[24.06.2024 03:03:04][ℹ️][network-container] Starting network-container... (try #1) done!
[24.06.2024 03:03:08][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:03:13][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:03:18][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:03:18][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:03:22][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:03:27][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:03:32][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:03:32][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:03:35][ℹ️][container1] The container has a delay set, waiting 30 seconds before carrying on
[24.06.2024 03:04:05][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:04:10][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:04:15][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:04:15][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:04:17][ℹ️][container1] The container has a delay set, waiting 90 seconds before carrying on
[24.06.2024 03:05:47][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:05:52][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:05:57][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:05:57][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:06:00][ℹ️][container1] The container has a delay set, waiting 60 seconds before carrying on

When checking back in the unraid GUI in the morning, all containers are present, just not started

Again, sorry for the assumptions!

@Commifreak
Copy link
Owner

Thats weird. But the names do not change, doesnt it?

@lonzodelorana
Copy link
Author

nope, the names are the same, rebuild only takes 30 -60 seconds or so for all of the dependant containers.

The container ID does change though for all of the dependant containers

@Commifreak
Copy link
Owner

The container ID is not being used for those actions. Please submit a debug log and share its ID. There are some internal debugs in that case.

@lonzodelorana
Copy link
Author

lonzodelorana commented Jul 16, 2024

Hi,

Sorry for the delay, the most recent debug log didn't have the issue, but it reoccurred last night, so I've submitted the log and the ID is

24d6ffd8-670d-4af5-a111-ae1b129dccfb

@Commifreak
Copy link
Owner

Commifreak commented Jul 16, 2024

I need to adjust my debug things inside the plugin to get a deeper look. Stay tuned.

@ChirpyTurnip
Copy link

I'm not sure if I have a similar problem or not. In my case everything was working fine for a few weeks (new user) and last night I received 27 emails telling me there were "back up issues" as the some docker containers couldn't be restarted as "they didn't exist".

Event: Appdata Backup
Subject: [AppdataBackup] Error!
Description: Please check the backup log!
Importance: alert

Container did not started after multiple tries, skipping.

Six containers were impacted and all of these were all configured such that they had no network - these were all slaved off GlueTun rather than being directly (or indirectly) connected to the network. All the other containers started with no issue.

The impacted containers started perfectly normally when manually started.

Logs uploaded - 6c11d71e-29a4-461d-9b1d-2d875f7d6f2e.

Just in case it helps.

@Commifreak
Copy link
Owner

The impacted containers started perfectly normally when manually started.

Thats the fun part: The start mechanism is the same as Unraid does it. Not sure why it did not work during backup :/

@oldsweatyman
Copy link

Having the same error:

Debug Log: 8edca3b9-9b4e-4833-9c8b-a291814ce2cc

Same thing about this happening only with containers routed through another container for its network. Seems like it might just need a delay for the network to build across the containers and then start? The error might be that the network container (in my case, qbittorrent with VPN) doesn't exist yet (it is rebuilding the network) as opposed to the containers that can't start not existing (e.g., the ones that have "--net=container:qbittorrent")

@Commifreak
Copy link
Owner

Maybe. But the script does not wait after qbittorrent_music. Is there a delay set at all?

@Commifreak
Copy link
Owner

But Noch such container is saying, the container is not there. A startup error with a network not existing, would cause another message as far as I know.

@ChirpyTurnip
Copy link

Having the same error:

Debug Log: 8edca3b9-9b4e-4833-9c8b-a291814ce2cc

Same thing about this happening only with containers routed through another container for its network. Seems like it might just need a delay for the network to build across the containers and then start? The error might be that the network container (in my case, qbittorrent with VPN) doesn't exist yet (it is rebuilding the network) as opposed to the containers that can't start not existing (e.g., the ones that have "--net=container:qbittorrent")

I'm not sure - all the containers linked to GlueTun immediately flag as "waiting for rebuild" when GlueTun is updated (so they all know they have to rebuild) and my starts are staggered with 10s delays so if this theory holds then maybe the first 1 or 2 containers would fail to start but the rest will work fine as they've had time to do their thing, but none of my containers will restart. Conversely, if it is the case that one is rebuilding to start and the process is blocking subsequent containers from starting then at least 1 container should be started with the other either all skipped, or if it was honoring the delay between failed start attempts, some started and some not. The actual rebuild process I find is very quick as there is no image to pull - in just seconds all of mine are rebuilt and ready to go....

@PilaScat
Copy link

PilaScat commented Jan 1, 2025

same problem
debug id 868ba415-051b-42e2-b406-3f626a882054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants