New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fixed an issue that could cause checksum mismatch errors in S3 uploads. #5836

Merged

millems merged 5 commits into master from millem/master/message-digest-fix

Jan 30, 2025

Contributor

millems commented Jan 29, 2025 •

edited

Loading

This error is sometimes encountered when a customer uses:

The AsyncS3Client
A ChecksumAlgorithm of SHA1 or SHA256 (instead of the default CRC32)
Parallel uploads

The root cause was the SDK using thread locals to cache the SHA1 or SHA256 message digest implementations. This meant that if a single event loop thread was processing multiple requests, those two requests would use the same digest implementation to calculate the checksum.

This PR updates the SHA1 and SHA256 (and MD5, though it's not used by S3) to use a LIFO cache.


          Fixed an issue that could cause checksum mismatch errors in S3 uploads.

48d19b5

This error is sometimes encountered when a customer uses:
1. The AsyncS3Client
2. A ChecksumAlgorithm of SHA1 or SHA256 (instead of the default CRC32)
3. Parallel uploads

The root cause was the SDK using thread locals to cache the SHA1 or SHA256 message digest implementations. This meant that if a single event loop thread was processing multiple requests, those two requests would use the same digest implementation to calculate the checksum.

millems requested a review from a team as a code owner

January 29, 2025 00:43


          Fixed an issue where SignerUtils was using the internal class of anot…

18f00ca

…her module.

zoewangg reviewed

View reviewed changes

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java Outdated

+                          // Avoid over-caching after large traffic bursts. The maximum chosen here is arbitrary. It's also not strictly
+                          // enforced, since these statements aren't synchronized.
+                          if (digestCache.size() <= MAX_CACHED_DIGESTS) {

Contributor

zoewangg Jan 29, 2025

size() may be expensive, should we track the size using atomic integer instead?

jeking3 Jan 29, 2025

How would you coordinate the atomic size and the concurrent deque?

Contributor Author

millems Jan 29, 2025

I'll switch to a linked blocking deque. Uses locking, but it's still likely faster than creating new checksums and has a constant-time size() method. We can do benchmarking later to verify this is the fastest method.

Is that reasonable?

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java

-                      DigestThreadLocal(String algorithmName) {
-                          this.algorithmName = algorithmName;
+                      /**
+                       * Retrieve the message digest bytes. This will close the message digest when invoked. This is because the underlying

Contributor

zoewangg Jan 29, 2025

Question: where do we close messageDigest in this method?

Contributor Author

millems Jan 29, 2025

Good catch, that needs a test added.

jeking3 reviewed

View reviewed changes

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java Outdated

+                          // Avoid over-caching after large traffic bursts. The maximum chosen here is arbitrary. It's also not strictly
+                          // enforced, since these statements aren't synchronized.
+                          if (digestCache.size() <= MAX_CACHED_DIGESTS) {

jeking3 Jan 29, 2025

How would you coordinate the atomic size and the concurrent deque?

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java Outdated

-                              throw new RuntimeException("Unable to fetch message digest instance for Algorithm "
-                                                         + algorithmName + ": " + e.getMessage(), e);
+                              return new CloseableMessageDigest((MessageDigest) digest.get().clone());
+                          } catch (CloneNotSupportedException e) { // should never occur

jeking3 Jan 29, 2025

Any time a comment says "should never occur", it seems to happen. Why can't this method declare that it throws CloneNotSupportedException instead?

Contributor Author

millems Jan 29, 2025

CloneNotSupportedException is a checked exception. Because the SDKs don't throw checked exceptions, the callers would need to wrap it in an unchecked exception themselves.

Contributor Author

millems Jan 29, 2025

I've improved the exception message if this scenario does happen.

dagnir reviewed

View reviewed changes

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java Outdated

Comment on lines 106 to 107

		// Avoid over-caching after large traffic bursts. The maximum chosen here is arbitrary. It's also not strictly
		// enforced, since these statements aren't synchronized.

Contributor

dagnir Jan 29, 2025

Would it be possible to have a "prefilled" cache of lazy loaded digests, and then when we can't use a cached digest, we have a different instance that just closes and doesn't need to interact with the cache?

Contributor Author

millems Jan 29, 2025

That's an option. It might be trickier to decide on the size of the cache, and then it means that not releasing a message digest to the cache will have long-term performance implications if it's one of those special "cached" digests. The advantage to this implementation is that the odd error that could fail to release the digest back to the cache doesn't really hurt.


          Update clone() failure message. Move to LinkedBlockingDeque for Diges…

03d0084

…tAlgorithm cache.

dagnir approved these changes

View reviewed changes

dagnir reviewed

View reviewed changes

core/checksums/src/main/java/software/amazon/awssdk/checksums/internal/DigestAlgorithm.java

-                              digestCache.addFirst(digest.get());
-                          }
+                          // Drop this digest is the cache is full.
+                          digestCache.offerFirst(digest.get());

Contributor

dagnir Jan 29, 2025

Nice

zoewangg approved these changes

View reviewed changes

millems added 2 commits

January 29, 2025 11:12


          Added direct testing of the DigestAlgorithm cache.

6a01566


          Fixed a transient issue that could occur when the first attempt fully…

4bd1ee1

… sends the payload, and then a retry happens with checksums enabled.

sonarqubecloud bot commented Jan 29, 2025

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
83.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

millems added this pull request to the merge queue

github-merge-queue bot removed this pull request from the merge queue due to failed status checks

millems added this pull request to the merge queue

github-merge-queue bot removed this pull request from the merge queue due to failed status checks

millems added this pull request to the merge queue

Merged via the queue into master with commit c137348

18 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet