-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file size limit for upload/ingest (optional and configurable) #4387
Conversation
4fdf17f
to
66e2cc5
Compare
Hmm, this is not strictly true, as, supporting TIFFs, we can encounter, if rarely, genuinely larger ones, because of all the stuff we don’t care about: layers and such. But it is true that most images like this are a result of user error in creating them and not genuine. And Grid isn’t used as an archive for genuine authored multi-layered ones (we may hit this case if we will want to store editable master files from Photoshop integration, for example). Secondly, it’s the quality of our imaging pipeline that often makes it worse/bombs: lack of parallelism but also bad habits around TIFFs etc (those should be helped by and also fixed in #4068). Thirdly, it’s often dimensions, not filesize, and one does not always follow the other. All that said, this is sensible and practical stop gap and prevents someone uploading, mostly absurd, image repeatedly bringing the whole thing down. Downside is: users don’t know how to reduce size effectively losslessly. Maybe, I would make it configurable (and not effective if not configured)? |
Hi Folks, Agree that there should be a limit on filesize - would be good for it be configurable. We also need to have the ability to handle >500MB files - we're likely to get some large scans from archives (either by error or design) which could conceivably go over this size. Short term happy with picking failures up from the bucket, medium term would be good to get more targeted handling, potentially notifications to a large image topic (or something like that) |
b358cb6
to
096e761
Compare
@paperboyo @RichardLe - as per your comments its now optional and configurable via |
096e761
to
f109695
Compare
f109695
to
0a722dd
Compare
0a722dd
to
d815827
Compare
Co-authored-by: Pete F <[email protected]>
Co-authored-by: Pete F <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested both UI and S3 uploads in TEST
and all working as described 👍👍
Seen on auth, usage, image-loader, metadata-editor, thrall, leases, cropper, collections, media-api, kahuna (merged by @twrichards 8 minutes and 32 seconds ago) Please check your changes! |
Recently the image-loader boxes got completely choked up and the queue grew in size. This was down to multiple whopping great images (e.g. 1GB), this was in the sweet spot (or rather not-so-sweet spot) between being small enough to successfully process and so big that it causes OoM and the box falls over and is recycled (still not great, but at least eventually self-healing) whereas the boxes stayed alive (passing healthcheck) but ultimately not completing the processing of the images.
What does this change?
This PR introduces an optional limit via a new config option
upload.limit.mb
(in common.conf, as it's used by bothkahuna
andimage-loader
.This limit is imposed in two places...
As it picks an item off the ingest queue in
image-loader
- so as to guard against images from any source (agency, photographer over FTPor UI upload) - which means we retain a copy of the offending image in the fail bucket for closer inspection.For UI uploads (
kahuna
), to avoid burning the planet by wastefully sending all those bytes to S3 only to have the server tell us the file is too big, we can alert the user promptly in the client, like so...How should a reviewer test this change?
Try dropping an image over 500MB (e.g. download https://media.gutools.co.uk/images/25287e288b9254437e65209b541aa8e8895d571e) into the ingest queue bucket (in a dir with your name on it) and you should see this in the logs (as it's not a UI upload...
...and the file should end up in the fail bucket.
Try uploading the same file to the UI and you should see a much prompter failure.
How can success be measured?
Who should look at this?
Tested? Documented?