Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make createMPU Call Async #84

Merged
merged 27 commits into from
Dec 26, 2024
Merged

Make createMPU Call Async #84

merged 27 commits into from
Dec 26, 2024

Conversation

waahm7
Copy link
Contributor

@waahm7 waahm7 commented Dec 19, 2024

Description of changes:
This PR is just a refactor of the Upload APIs and doesn't make any functional changes.

  • This is a follow-up to Expose Download Metadata for Every Chunk #73 to return the handle ASAP for the UploadObject API as well. This approach will make it easier to implement features like reading the first part to determine PutObject vs MPU.
  • upload_handle.abort() now takes ownership of the handle instead of a reference. It makes more sense to take ownership so that customers can't call join or do other things with the handle after aborting, as it won't be valid.
  • I have added a TODO to make the completeMPU call asynchronous as well, instead of waiting for users to call join. From testing, completeMPU can be an expensive call for large objects.
  • upload_id was an Option because it won't be available for PutObject. I have moved it to the MultipartUploadData struct, and it will be non-optional wherever MPU is involved.
  • Move test_cancel_single_upload_via_multipart_upload test to upload_tests.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@waahm7 waahm7 changed the title WIP | Make createMPU call async Make createMPU Call Async Dec 20, 2024
@waahm7 waahm7 marked this pull request as ready for review December 20, 2024 00:11
@waahm7 waahm7 requested a review from a team as a code owner December 20, 2024 00:11
.github/workflows/ci.yml Show resolved Hide resolved
stream: InputStream,
part_size: u64,
upload_id: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't upload id part of mpu_data?

Also the whole point of UploadContext was so to thread through all the state needed for an operation...

Copy link
Contributor Author

@waahm7 waahm7 Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't upload id part of mpu_data?

Thanks, fixed by not passing upload_id explicitly.

Also the whole point of UploadContext was so to thread through all the state needed for an operation...

Yeah, but the problem was that we no longer have mutable access to UploadContext in the createMPU task, because we do not own the handle at that point.

@@ -252,8 +252,7 @@ async fn do_upload(args: Args) -> Result<(), BoxError> {
.bucket(bucket)
.key(key)
.body(stream)
.send()
.await?;
.initiate()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where'd you get the name initiate()? Is that a standard name used in similar libraries? Just want to be sure we pick something predictable for users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, Aaron suggested changing the name from send to avoid confusion with the Rust SDK at #73 (comment), and I couldn't think of something better than initiate/send.

Comment on lines +76 to 80
// TODO: We won't send completeMPU until customers join the future. This can create a
// bottleneck where we have many uploads not making the completeMPU call, waiting for the join
// to happen, and then everyone tries to do completeMPU at the same time. We should investigate doing
// this without waiting for join to happen.
complete_upload(self).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the TODO. If small-file PutObject can complete before the user calls join(), then CompleteMPU should be sent before the user calls join.

whatever we do, we should try to be consistent for small and large objects,

@@ -917,6 +917,6 @@ impl crate::operation::upload::input::UploadInputBuilder {
) -> Result<UploadHandle, crate::error::Error> {
let mut fluent_builder = client.upload();
fluent_builder.inner = self;
fluent_builder.send().await
fluent_builder.initiate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're changing the name-and-asyncness of UploadFluentBuilder::send() then we should change the name-and-asyncness of UploadInputBuilder::send_with()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, I can make it consistent once we have agreed on the initiate name in the previous feedback.

Copy link
Contributor

@graebm graebm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix & ship

@waahm7 waahm7 merged commit dca33b4 into main Dec 26, 2024
13 checks passed
@waahm7 waahm7 deleted the upload-initiate branch December 26, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants