Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making time between object validity constant and small on S3 #605

Closed
srerickson opened this issue Jun 9, 2022 · 3 comments
Closed

Making time between object validity constant and small on S3 #605

srerickson opened this issue Jun 9, 2022 · 3 comments

Comments

@srerickson
Copy link
Contributor

srerickson commented Jun 9, 2022

One of the pain-points when updating OCFL objects on S3 is that there is no constant-time move/rename operation (as there is on a filesystem) to use when moving content files into place. As a result, objects are invalid for however much time it takes to upload new content files and/or copy them between keys on S3. Ideally, the time between object validity would be constant and not vary with the number/size of content files. This would make it much easier to implement object locking for version updates (e.g., lock expiration times could be used with more confidence).

One way to make the time between object validity constant, would be to allow incomplete version directories for the head+1 version. I'm not saying this is the best approach, it's just the first idea that comes to mind.

@pwinckles
Copy link

I had considered write-locking objects during validation, at least in ocfl-java, but ended up simply noting in the docs that updating an object while it's being validated will produce inaccurate results. I'm struggling to remember now if there was a specific reason why I decided not to lock.

@srerickson
Copy link
Contributor Author

srerickson commented Jun 14, 2022

My sense is the urgency of this issue will depend on the overall architecture in which OCFL is used. If operations (access/validation/update) for objects in a storage root are coordinated by a single server process, preventing simultaneous validations and updates is more straightforward. On the other hand, distributed (e.g., git-like) architectures might be easier to implement if transitions between objects states during updates were more predictable.

@neilsjefferies
Copy link
Member

Merged into #372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants