Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful Shutdown ignores Websocket handler task #3003

Open
SZenglein opened this issue Oct 21, 2024 · 5 comments
Open

Graceful Shutdown ignores Websocket handler task #3003

SZenglein opened this issue Oct 21, 2024 · 5 comments

Comments

@SZenglein
Copy link

Bug Report

Version

axum 0.7.4

Platform

Linux [...] 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux

Crates

axum

Description

I have the following scenario:

  • I need websocket connections which contain shared references (Arc) to the application state
  • Sometimes I need to stop the service and be sure all shared state is dropped
  • I am using graceful shutdown

The concrete reason the state needs to be dropped is that it may contain a file handle that is exclusive and that I want to access after shutting down.

The problem:
The task spawned in ws.on_upgrade is not connected to the graceful shutdown future at all. I can listen to a global CancellationToken and quit the connection myself, but that still does not guarantee me that the connection is closed when the graceful shutdown future completes.
If I don't immediately quit the application after the server quits, the websocket connection happily continues running. Try placing a sleep below the axum::serve.

I would expect websockets to behave the same as SSE, i.e. blocking graceful shutdown until completion (#2673).

@mladedav
Copy link
Collaborator

Try placing a sleep below the axum::serve.

Bellow what axum::serve? Did you mean to add an example of the issue?

@SZenglein
Copy link
Author

SZenglein commented Oct 22, 2024

Sorry, I thought it was more obvious what I meant. I am simply talking about the end of main in a typical axum application. The last command is typically serving the axum app, after which the tokio runtime is immediately shut down.

The purpose of a sleep would simply be a quick hack for keeping the tokio runtime alive a bit longer. Then it is easy to see that websocket connections are kept open for too long.

I have created a quick example by combining the websocket and graceful-shutdown examples. It demonstrates quite well that there is no connection whatsoever between the graceful shutdown future and the websocket handlers.
https://github.com/SZenglein/axum/tree/websocket_shutdown_example

@SZenglein
Copy link
Author

I have now solved/worked around this issue by tracking all websocket handlers using tokio_util::task::TaskTracker along with CancellationTokens.
I had other tasks to track anyway and was using a TaskTracker already.

I still think axum should track the future spawned inside on_upgrade

@mladedav
Copy link
Collaborator

Thanks, I see what the issue is now.

When we upgrade the connection, it seems the whole stack considers the original connection as closed but its io is transferred and we don't track that anymore. Propagating some kind of a channel there so that we keep track of it for the purpose of graceful shutdown should be easy enough.

That said, I don't think that's where the story ends because what I think will happen is all middlewares will also run as if the request has finished (which it kind of did since we produced a response, but also not quite). This could break for example layers trying to limit the number of concurrent connections or maybe some tracing layers.

@SZenglein
Copy link
Author

That said, I don't think that's where the story ends because what I think will happen is all middlewares will also run as if the request has finished (which it kind of did since we produced a response, but also not quite). This could break for example layers trying to limit the number of concurrent connections or maybe some tracing layers.

That's an interesting argument. As I understand it, upgraded websocket connections completely leave the axum (or tower I guess) model of layers and services behind and do their own thing?

If that is the case, this should be very clearly documented. An application enforcing a connection limit only for that to be broken by websockets is not great.

Another solution could be to model websockets as a proper service, but I am not deep enough into tower concepts to comment on that. Could you model a service with both the Request and the Response as streams? SSE at least already works similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants