Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis Async Client hangs after java.io.IOException: Response had content-length of 28 bytes, but only received 0 bytes before the connection was closed #4354

Closed
MichalZalewskiRASP opened this issue Aug 26, 2023 · 8 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@MichalZalewskiRASP
Copy link

Describe the bug

I get the same defect as described here: #3335 when I use the SubscribeToShard API. It hungs and no other visitor.visit() is executed after the Kinesis Async Client throws
java.io.IOException: Response had content-length of 28 bytes, but only received 0 bytes before the connection was closed.
Basically, stream consumption is halted afterwards regardless of the fact that I use the onError handler in the request.

Expected Behavior

The SubscribeToShard API should be able to resume stream consumption after encountering IOException.

Current Behavior

The thread hangs and the client does not continue to consume the stream.

Reproduction Steps

It is difficult to reproduce as it happens once per week randomly when the message length does not pass the validation.
My dependencies:

<dependency>
            <groupId>software.amazon.kinesis</groupId>
            <artifactId>amazon-kinesis-client</artifactId>
            <version>2.5.2</version>
</dependency>
 <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>url-connection-client</artifactId>
            <version>2.20.123</version>
            <scope>test</scope>
</dependency>
 <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>sts</artifactId>
            <version>2.20.123</version>
  </dependency>

Here is my setup.

I call (in Kotlin):

kinesisClient.subscribeToShard(requestToSubscribe, responseHandler).get()

whereby:

  • kinesisClient is software.amazon.awssdk.services.kinesis.KinesisAsyncClient.
  • `` is defined as:
SubscribeToShardRequest.builder()
        .consumerARN(<some efo consumer arn>)
        .shardId(<some shard name>)
        .startingPosition(<some starting position>)
        .build()
  • `` is defined as:
SubscribeToShardResponseHandler
        .builder()
        .onResponse{
            log.info("Subscription to Kinesis stream started {}", it)
        }
        .onComplete {
            log.info("Subscription to Kinesis stream completed")
        }
        .onError { t: Throwable ->
            when (t) {
                is ResourceInUseException -> log.debug("Other instance is already subscribed to Kinesis: " + t.message)
                else -> log.error("Error when reading data from processor: stack={}", t.stackTrace)
            }
        }.subscriber(<some visitor>)
        .build()

Possible Solution

No response

Additional Information/Context

Stack Trace:
java.io.IOException: Response had content-length of 28 bytes, but only received 0 bytes before the connection was closed.

software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.validateResponseContentLength(ResponseHandler.java:163), software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$700(ResponseHandler.java:75), software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:369), software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.complete(HandlerPublisher.java:447), software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.channelInactive(HandlerPublisher.java:430), io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303), io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281), io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274), io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81), io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277), io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303), io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281), io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274), io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405), io.netty.channel.AbstractChannelHandlerContext

AWS Java SDK version used

2.20.43 / kcl 2.5.2

JDK version used

11.0.9

Operating System and version

linux (different versions) x86_64

@MichalZalewskiRASP MichalZalewskiRASP added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 26, 2023
@debora-ito
Copy link
Member

@MichalZalewskiRASP acknowledged.

We will investigate, but, like you mentioned, we also find this is hard to reproduce.
If you or anyone is experiencing this issue and can reproduce it reliably, please send us a repro code.

@debora-ito debora-ito added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Aug 30, 2023
@debora-ito
Copy link
Member

@MichalZalewskiRASP

A change (#4402) was released in SDK version 2.20.144, please try it out and let us know if you still see the exception.

@debora-ito debora-ito added the closing-soon This issue will close in 4 days unless further comments are made. label Sep 11, 2023
@MichalZalewskiRASP
Copy link
Author

MichalZalewskiRASP commented Sep 13, 2023

Hi @debora-ito I have just deployed the application with the new dependency. I need a few days to check as this bug happens once per few days. I will get back. Thank you for taking care of my request.

@github-actions github-actions bot removed the closing-soon This issue will close in 4 days unless further comments are made. label Sep 13, 2023
@antovespoli
Copy link

Hi @debora-ito, I validated the fix and it has been holding correctly since applying it a few weeks back. This is to confirm that the fix is effective.

@gaoyibin0001
Copy link

But still face the same issue when use flink-kinesis-connector,

org.apache.flink flink-connector-kinesis 5.0.0-1.20

Affected applications will experience the following symptoms:

Flink job is in RUNNING state, but not processing data;

There are no job restarts;

Checkpoints are timing out.

@debora-ito
Copy link
Member

Hi @gaoyibin0001

Looking at the pom file, looks like flink-connector-kinesis 5.0.0-1.20 shades the Java SDK:

<relocation>
    <pattern>software.amazon</pattern>
    <shadedPattern>org.apache.flink.kinesis.shaded.software.amazon</shadedPattern>
</relocation>

so I recommend you reach out to the Apache Flink support team to help you troubleshoot the issue.

@debora-ito
Copy link
Member

Closing this as the original issue was resolved.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

4 participants