Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnecting media that failed to be moved from source to destination on ADO persistence #226

Open
DiegoPino opened this issue Jan 3, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request File processing Everything is a file, even me. queue workers Ones taking the FI and doing the FO queue FIFO VBO Actions I got my head out the sunroof
Milestone

Comments

@DiegoPino
Copy link
Member

What?

So first a disclaimer (and this is on me many times, sorry). AWS SDK (s3) requires any file larger than 5 Gbytes to be uploaded via Multipart. We implement this correctly. But...

@alliomeria this is important!

...If your Archipelago (production) is Routing via minio an AWS S3 bucket instead of using it directly from AWS via S3FS, then Mini (a bug there) will incorrectly send a header to Amazon that is not in specs with the current API.

And uploading a temp file to its final destination (managed by Archipelago) will fail with this message in Minio:

- API: CopyObjectPart(bucket=YOURBUCKET object=media/111/video-UUID-verylarge.mp4)
Time: 21:18:09 UTC 01/03/2025
DeploymentID: XXXX
RequestID: XXXX
RemoteHost: XXXXX
Host: esmero-minio:9000
UserAgent: aws-sdk-php/3.324.10 ua/2.0 OS/Linux#6.1.112-122.189.amzn2023.aarch64 lang/php#8.1.25 GuzzleHttp/7
Error: x-amz-server-side-encryption header is not supported for this operation. (minio.ErrorResponse)

If your storage is 100% managed by Mini, no issue, if your storage on the Drupal side of things is direct to AWS S3 no issue.

Ok, that said. Let's say your file was not correctly uploaded to media/111/video-UUUID-verylarge.mp4. For failsafe we will still "connect" it to the original source (to avoid total failure) and it will stay where it was (so you can still stream it) at e.g /media/upload/verylarge.mp4. Issue with this? Well, if you delete your originals, all is broken for that file. But most importantly IIIF (cantaloupe) won't see it and won't be able to provide anything.

In specific, this one here does the job on a normal operation:

https://github.com/esmero/strawberryfield/blob/ce3aae811900f49996a1b2bb631b64b602a004f2/src/EventSubscriber/StrawberryfieldEventPresaveSubscriberFilePersister.php#L107

calling this here:

https://github.com/esmero/strawberryfield/blob/main/src/StrawberryfieldFilePersisterService.php#L764-L883

The thing is, once we pass that stage (if we could/not could) we make the file permanent to avoid failure and that whole code will never run again, making (except via custom code/and manually copying things) re-establishing the Archipelago "we manage your file" contract impossible.

Now that we know the problem. Solution

What do we do?

Few options.

  • Allow a re-try on a next save. Like add a flag somewhere so we can not depend only if the "file" entity is permanent to decide if we run or not
  • Emergency. A VBO action that does exactly what ::persistFilesInJsonToDisks does, but without the "constraints" there. Moreover. It validates first. Here is how
    • For every File in an ADO
    • Get the current Storage Location.
    • If Current Storage Location != Desired Location
      • Check if there is already a file in Desired Location, if not Copy to Desired Location
      • Update the File Entity to use the Desired Location
      • Save.
      • Done.
    • Else. Do nothing

My only concern is: VBO. the files I saw fail today are 20Gbytes+. Not sure VBO will be able to do this in a run. So probably the VBO part will only do the "check if needs fixing" and then enqueue in the new AMI Action Queue to actually do the JOB.

Thanks

@DiegoPino DiegoPino added enhancement New feature or request queue FIFO queue workers Ones taking the FI and doing the FO File processing Everything is a file, even me. VBO Actions I got my head out the sunroof labels Jan 3, 2025
@DiegoPino DiegoPino self-assigned this Jan 3, 2025
@DiegoPino DiegoPino added this to the 0.9.0 milestone Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request File processing Everything is a file, even me. queue workers Ones taking the FI and doing the FO queue FIFO VBO Actions I got my head out the sunroof
Projects
None yet
Development

No branches or pull requests

1 participant