Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked transmission lasts longer than timeout #4214

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

sergiuszkierat
Copy link
Member

@sergiuszkierat sergiuszkierat commented Dec 19, 2024

Why I did it?
In order to have a test which might confirm an issue
with an interrupted request

How I did it:
I prepared NettyCatsRequestTimeoutTest with the folloing test scenario:

  • send first chunk (100 bytes)
  • sleep
  • send second chunk (100 bytes)

Closes #4169

@adamw
Copy link
Member

adamw commented Dec 20, 2024

There are two differences compared to the original report:

  1. it uses the play interpreter, not the netty one (there's netty underneath as well, but the interpreter is different)
  2. the request there is interrupted by the client, not by the server because of a timeout. However ... I think we won't be able to simulate this exact scenario, so a request timeout should do the trick as well

@sergiuszkierat sergiuszkierat force-pushed the fix/chunked_transmission_timeout branch from 49a1e12 to ad42e13 Compare December 23, 2024 15:38
@sergiuszkierat
Copy link
Member Author

There are two differences compared to the original report:

  1. it uses the play interpreter, not the netty one (there's netty underneath as well, but the interpreter is different)
  2. the request there is interrupted by the client, not by the server because of a timeout. However ... I think we won't be able to simulate this exact scenario, so a request timeout should do the trick as well

Thanks for your precise comment. I've rewritten it to PlayServerTest. Anyway, there is no luck in reproducing issue.

.streamBody(Fs2Streams[IO])(inputStream)
.send(backend)
.map{ response =>
response.code shouldBe StatusCode.Ok
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm well if this test passes, something is wrong - we set the timeout to 1s, so we should never receive a response if it takes 2s to send it? unless the request timeout is for something else?

anyway, this doesn't test the scenario from the test case - where the transmission is interrupted half-way because of connection problems; I don't know if we can simulate this in a test case, but using a timeout is a good approximation. But probably a good way to check if we can at all reproduce the bug is to run: a long-running client sender process; a server process; then kill -9 the client process when it's half-way sending the data, and seeing on the server if received the incomplete data in the server logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have given it another try as you suggested. There are playServer and longLastingClient but I don't know what's wrong with that approach. Suggestions are welcome 💡

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you write "I don't know what's wrong with this approach ", do you mean that it works as expected (that is: you run both, interrupt the client brutally after some time, and the server properly closes the connection), or is there something else that's wrong?

Copy link
Member Author

@sergiuszkierat sergiuszkierat Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying with the following steps:

  1. $ scala-cli run playServer.scala
[playServer-pekko.actor.default-dispatcher-4] INFO org.apache.pekko.event.slf4j.Slf4jLogger - Slf4jLogger started
[application-pekko.actor.default-dispatcher-6] INFO org.apache.pekko.event.slf4j.Slf4jLogger - Slf4jLogger started
[main] INFO play.api.Play - Application started (Dev) (no global state)
[main] INFO play.core.server.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
  1. $ scala-cli run longLastingClient.scala
ByteString(65, 65, 65, 65, 65, 65, 65, 65, 65, 65) 17:46:14.514330
ByteString(65, 65, 65, 65, 65, 65, 65, 65, 65, 65) 17:46:15.534314
ByteString(65, 65, 65, 65, 65, 65, 65, 65, 65, 65) 17:46:16.553033
ByteString(65, 65, 65, 65, 65, 65, 65, 65, 65, 65) 17:46:17.573135
....
  1. server side
Received 10000 bytes, Source(SourceShape(Map.out(1467551936))) bytes in total
  1. $ ps aux | grep longLastingClient | awk '{print $2}' | head -n 1 | xargs kill -9

  2. and nothing new (error/exception/whatever) on server side 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lack of exception/error might be the problem here ;) but first there are two problems in the code:

  1. in the client code, you claim to send 10000 bytes in the content-length, but in fact you might send more?
  2. in the server code, you do stream.map(_.length), which just creates a Source[Long], that is a description of a stream that produces lenghts of received byte-strings (byte chunks). You never run (receive) the stream, and that's where you'd expect to see errors (when the stream is being run)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ added pointed adjustments and the result is the following :

 $ scala-cli run playServer.scala
[playServer-pekko.actor.default-dispatcher-4] INFO org.apache.pekko.event.slf4j.Slf4jLogger - Slf4jLogger started
[application-pekko.actor.default-dispatcher-4] INFO org.apache.pekko.event.slf4j.Slf4jLogger - Slf4jLogger started
[main] INFO play.api.Play - Application started (Dev) (no global state)
[main] INFO play.core.server.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
$ scala-cli run longLastingClient.scala
Chunk 1 sent 13:00:13.650915
Chunk 2 sent 13:00:14.668797
Chunk 3 sent 13:00:15.690020
Chunk 4 sent 13:00:16.710184
Chunk 5 sent 13:00:17.730033
Chunk 6 sent 13:00:18.748416
Chunk 7 sent 13:00:19.769451
Chunk 8 sent 13:00:20.789355
  1. killing client

  2. server side

Stream finished: 800/10000 transmitted

😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I suppose, that's the problem as originally reported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now try to fix it, and/or reproduce using a test :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I suppose that we should return an error with a descriptive message i.e. java.lang.IllegalStateException: Expected Content-Length: 10000 bytes, but only 800 were written, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since the client interrupted the transmission, I think it's reasonable to expect that the stream will fail? With any exception for a start.

@sergiuszkierat sergiuszkierat force-pushed the fix/chunked_transmission_timeout branch 3 times, most recently from 4281e96 to f1aad13 Compare January 2, 2025 12:15
*Why I did it?*
In order to have a test which might confirm an issue
with an interrupted request

*How I did it:*
I prepared `NettyCatsRequestTimeoutTest` with the folloing test scenario:
 - send first chunk (100 bytes)
 - sleep
 - send second chunk (100 bytes)
- add PlayServerTest instead of NettyCatsServerTest
- improve fs2 implementation
This reverts commit 02ba4b0.
@sergiuszkierat sergiuszkierat force-pushed the fix/chunked_transmission_timeout branch from f1aad13 to c8f0be9 Compare January 16, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Do not parse interrupted request
2 participants