Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 putObject via RequestBody.fromContentProvider yields an object with 0 bytes #5824

Closed
1 task done
tarehart opened this issue Jan 25, 2025 · 3 comments
Closed
1 task done
Labels
bug This issue is a bug. p1 This is a high priority issue potential-regression Marking this issue as a potential regression to be checked by team member

Comments

@tarehart
Copy link

Describe the bug

S3 putObject via RequestBody.fromContentProvider yields an object with 0 bytes. The operation completes as though it were successful, which increases the harm of this bug since there's potential for undetected data loss.

I believe this is related to #5801. The problem goes away when I set the environment variable AWS_REQUEST_CHECKSUM_CALCULATION=WHEN_REQUIRED.

This started with 2.30.0.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

S3 putObject via RequestBody.fromContentProvider uploads all content from the input stream, yielding an object with non-zero size in the S3 bucket.

Current Behavior

The object in S3 has zero bytes, despite the operation reporting success.

Reproduction Steps

package repro;

import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.HeadObjectRequest;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;

import java.io.IOException;
import java.net.URL;


public class UploadRepro {

    public static void main(String[] args) throws IOException {
        final var bucket = "XXXXXXXXXXXXXX";
        final var key = "XXXXXXXX";
        
        final var url = new URL(
            "https://raw.githubusercontent.com/aws/aws-sdk-java-v2/ad35231f768e1bb68e6f77cb29f69d1a7278931e/.changes/next-release/feature-AmazonS3-c101d4d.json"
        );

        var s3Client = S3Client.builder()
                .credentialsProvider(ProfileCredentialsProvider.builder().profileName("Dev-Admin").build())
                .region(Region.US_EAST_1)
                .build();

        try (var stream = url.openStream()) {
            s3Client.putObject(
                PutObjectRequest.builder().bucket(bucket).key(key).build(),
                RequestBody.fromContentProvider(() -> stream, "application/json")
            );
        }

        var length = s3Client.headObject(
            HeadObjectRequest.builder().bucket(bucket).key(key).build()
        ).contentLength();

        System.out.println(length); // prints 0
    }
}

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.30.1

JDK version used

openjdk version "17.0.13"

Operating System and version

MacOS 15.1.1

@tarehart tarehart added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 25, 2025
@bhoradc bhoradc added p1 This is a high priority issue potential-regression Marking this issue as a potential regression to be checked by team member and removed needs-triage This issue or PR still needs to be triaged. labels Jan 27, 2025
@bhoradc
Copy link

bhoradc commented Jan 27, 2025

Hi @tarehart,

Thank you for reporting the issue. I am able to reproduce the behaviour you mentioned. Looks specifically related to using RequestBody.fromContentProvider() method to upload to S3.

Minimal reproducible code sample:

public class Main {
    public static void main(String[] args) throws MalformedURLException {

        final var bucket = "****";
        final var key = "regression.json";

        final var url = new URL(
                "https://raw.githubusercontent.com/aws/aws-sdk-java-v2/ad35231f768e1bb68e6f77cb29f69d1a7278931e/.changes/next-release/feature-AmazonS3-c101d4d.json"
        );

        var s3Client = S3Client.builder()
                .region(Region.US_EAST_1)
                .build();

        try (var stream = url.openStream()) {
            s3Client.putObject(
                    PutObjectRequest.builder().bucket(bucket).key(key).build(),
                    RequestBody.fromContentProvider(() -> stream, "application/json")
            );
        } catch (IOException e) {
          throw new RuntimeException(e);
        }

        var length = s3Client.headObject(
                HeadObjectRequest.builder().bucket(bucket).key(key).build()
        ).contentLength();

        System.out.println(length); // prints 0
    }
}

From Java SDK v2.30.0 onwards it causes the S3 objects to be uploaded with 0 bytes despite the operation reporting 
success. Same code sample works fine for version 2.29.52 and prior.

We are looking into this issue further.

Regards,
Chaitanya

@agicquelamz
Copy link

Hi @tarehart, It looks like the implementation of ContentStreamProvider in your sample code does not satisfy its API contract. Per the interface's documentation [1], the result of newStream() must always start at the beginning of the data, and must return the same content over all invocations. Depending on whether the stream implementation supports mark and reset, this requirement can be satisfied in a few different ways:

Use mark and reset

Here, we use mark(int) in the constructor before reading starts to ensure that we can reset back to the beginning, and then on each invocation of newStream(), we ensure that the stream is reset.

public class MyContentStreamProvider implements ContentStreamProvider {  
    private InputStream contentStream;  
  
    public MyContentStreamProvider(InputStream contentStream) {  
        this.contentStream = contentStream;  
        this.contentStream.mark(MAX_LEN);  
    }  
  
    @Override  
    public InputStream newStream() {  
        contentStream.reset();  
        return contentStream;  
    }  
}

Use buffering if mark and reset are not available

If your stream doesn't support mark and reset directly, you can still use the above solution by first wrapping the stream in a BufferedInputStream:

public class MyContentStreamProvider implements ContentStreamProvider {  
    private BufferedReader contentStream;  
  
    public MyContentStreamProvider(InputStream contentStream) {  
        this.contentStream = new BufferedInputStream(contentStream);  
        this.contentStream.mark(MAX_LEN);
    }  
  
    @Override  
    public InputStream newStream() {  
        contentStream.reset();  
        return contentStream;  
    }  
}

Always return a new stream, and close the previous one

A simpler approach is to simply obtain a new stream to your data on each invocation, and close the previous one:

public class MyContentStreamProvider implements ContentStreamProvider {  
    private InputStream contentStream;  
  
    @Override  
    public InputStream newStream() {  
        if (contentStream != null) {  
            contentStream.close();  
        }  
        contentStream = openStream();  
        return contentStream;  
    }  
}

[1] https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/http/ContentStreamProvider.html

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p1 This is a high priority issue potential-regression Marking this issue as a potential regression to be checked by team member
Projects
None yet
Development

No branches or pull requests

4 participants