-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Our thoughts on OCFL over S3 #522
Comments
@marcolarosa Have you successfully created large numbers of buckets in the past? My understanding is that by default an AWS account may only have up to 100 buckets, and that this cap may be increased to a maximum of 1,000 (AWS reference). You shouldn't need to download the entire object in order to update it. I wouldn't expect that you'd need anything other than a copy of the most recent inventory. |
@pwinckles Actually, I didn't know there were limits to how many buckets one could create! And looking at the reference page you linked the SHA512 id as bucket name is also not allowed as it's too long. So I guess we need to think this through some more! Has anyone else tried using S3 as a backend? What were the design decisions and why were they taken? |
Actually, this might be workable... There are only 256 top level folders when pairtreeing a SHA512 id (00 - ff) so it is partially workable with a limit request increase and a naming convention for bucket names so that one could map from a SHA512 id to the correct AWS bucket: e.g. 00c7b262.... => pairtree: 00/c7/b2/62 --> my.reverse.domain.ocfl.00 and the rest of the path exists inside the bucket... |
There is a working implementation in ocfl-java. I would not say that it's doing anything particularly clever or unexpected. It basically just maps storage paths directly to keys within a bucket, so what you see in a bucket is essentially the same as if it was on the filesystem, and it use a DB for locking and resolving eventual consistency issues. As for performance. It is very slow compared to using the local filesystem, but perhaps no slower than the server is able to transfer data to S3. It has not been tested "at scale" yet. Sharding across buckets is an interesting idea, though it does feel a bit like a premature optimization unless you're confident it's going to be an issue. S3 scales buckets based on their load and as long as your keys are well distributed, which I would expect them to be because they're prefixed with a hash, I would expect it to be able to cope. |
I have no idea which is why I wanted to reach out here and see what experience others have had when working with S3 at scale. My experience of S3 is limited to very simple usage with not a lot of data. However, now I'm potentially uploading 30,000+ OCFL objects containing 106TB of data. It's been a hard slog getting 70 out of the backup system onto a disk so if there's a best practice out there i'd like to start from there. |
I have a had a couple of thoughts about OCFL on S3 - these are just musings. When @marcolarosa was exploring this and discussing the limits of S3 such as the number of buckets and path length he mentioned that one of the things that your would lose with an S3 implementation would be the ability to inspect a filesystem to see what's where. I wonder if there could be a hybrid implementation mode where you keep the OCFL structure as a "skeleton" on a standard file systems but the payload files actually contain URIs or similar that points to the content. So open up Bit of a hack, yes but it would (a) preserve the ability for naive users to find files and where they are meant to be and (b) would allow for storage-by-hash in a remote service giving you repository-wide de-duplication (rather than the object level you get at the moment). |
Merged into #372 |
Noting #372
This ticket is about our thinking around how to model an OCFL repo on S3. We have not implemented this yet which is why we're looking for feedback here.
Our current demonstrator with 70TB in OCFL
We've built a demonstrator with about 70TB of data in OCFL (which you can see at http://115.146.80.165/) but for various reasons we need to bring forward our work on developing this backend option. Our id paths are as follows:
Models for OCFL on S3
We think there are two ways to move from a filesystem to S3:
The repo lives inside a single bucket
At first glance this seems like the easiest option but on further thought we don't think this solution will scale. Given that objects inside an S3 bucket are not actually hierarchical we would expect the performance of operations on this bucket would decrease as the amount of content inside it grows. Obviously this is complicated by the all of the extra path elements coming from the pairtree'd SHA512 ids.
Each OCFL object is its own bucket
In this model the SHA512 id would be the bucket name. We think this is the better option as we expect that the infra underpinning the S3 system would be optimised for mapping a bucket ID to a storage path in the cloud. Within that bucket one would find an OCFL object in the expected form. Although the performance of the bucket would degrade as the number of versions / items inside it increases this would be trivial compared to the degradation of adding a whole object (and associated paths and stuff) into a bucket as in the first model.
Our current nodejs lib - practical considerations
We have a nodejs package that we use to interact with OCFL: https://github.com/CoEDL/ocfl-js. One of the key ideas is that updates happen outside of the OCFL hierarchy. Specifically, the library creates a
deposit
path and abackup
path when updating an object as a way of locking changes to the OCFL object whilst an update is in progress and ensuring an atomic move once the update has completed.If the whole OCFL repo lived inside an S3 bucket this process of creating and working with a deposit and backup path would be cumbersome. However, having a bucket per ocfl means that it would have to live outside of the S3 system. This has pro's and con's. The pro's mean that all object operations happen on a server outside of S3 (which I don't think can be avoided anyway) but it would require the library to first pull the whole object down before it could operate on it. In the case of very large objects (a few TB) this would result in quite significant slowdowns. (There might be ways to avoid this by using the ETAG provided by AWS but that's not really the point of this thread).
So - how does this sound to people? Is there something missing here? Are there already examples that have tackled this question at scale?
The text was updated successfully, but these errors were encountered: