Raido integration - concurrency control #10

ardc-shorn · 2022-12-02T01:48:21Z

ardc-shorn
Dec 2, 2022

Note: I'm framing this discussion in terms of Raido, but the topic is a concern for RAiD as well.

After talking to Auckland Uni and Redbox integration teams, it seems some folks are wanting to integrate via a "push" model - where their local system is the "source of truth" for their data and they occasionally "push" or "publish" their data to Raido.

Up till now, I'd been assuming integration would follow a "pull" model - where Raido was considered the central "source of truth" and systems would integrate by always calling Raido APIs in near real-time.

This leads to a problem both teams have already encountered in their integrations with other systems - the dreaded lost update.

Note that concurrency control is an issue that Raido needs to address regardless of folks integrating via a "push" or "pull" model.
Even if an institution were using only the Raido app-client (i.e. our web UI), lost updates can still happen if concurrency control is not addressed. Right now, using the current Raido app-client, you can observe the lost update issue by using two separate browsers and carefully timing your read and update actions (note that this is a little bit tricky to actually reproduce, because the app-client uses window focus re-fetching).

The original plan to deal with concurrency control was to either use a simple optimistic locking approach or a full-fledged MVCC scheme. Likely via extra fields in the IdBlock.

The plan was to implement a simple "first update wins" strategy and the second updater (either via API or app-client) would see an error and would have to re-try.

But a simple implementation like that is going to cause problems with a "push" integration model. The "push" model makes lost updates likely if people are using both their own "delayed push" API integration and doing real-time updates to the same raid. And much worse, it results in temporal displacement - the conflict is only know about at the time the user tries to publish their raid data. When concurrency conflicts are detected quickly, they're usually easy to resolve. When the updates that cause conflicts are potentially separated by weeks - it amplifies things and makes it much more difficult to resolve.

Unless considered carefully - a simple locking schema could easily result in a situation where a raid gets "stuck": the pushing system will no longer be able to publish to raid because their version is stale. They would need to refresh the raid and somehow and then merge the updates made (some time ago) in their local system with the updates made in Raido. Or Raido will have to implement merging logic... somehow.

Any data merging based approach will be tricky to implement. We could probably reduce the likelihood of concurrency conflicts by extending the MVCC concept down to "block" level, but that opens up its own can of worms (reconciling the concept of "raid version" with "block version", etc.) and doesn't really solve the root problem - it just pushes the concurrency issues down to block level instead of raid level.

ardc-shorn · 2022-12-02T02:06:55Z

ardc-shorn
Dec 2, 2022
Author

Random thoughts

Auckland Uni talked about marking certain fields of a raid as being "locked for update by the integration"

Or maybe we could just mark a raid that was minted via an integration as being locked for update by that integration?
so we have the concept of either a raid-level or block level "channel lock"?
- block level channel locking would imply block-level version though 😒
can build functionality to "unlock" or "change locking channel"

Even with something like channel locking, still need to implement some kind of MVCC style concurrency control for app-client (and any near real-time "pull" model integrations).

Or we could just say that Raido doesn't support the push model?
I'm guessing some folks are not going to like the concept of Raido not being the "source of truth" for raid data.

Push model brings some nice benefits though:

less load on Raido servers because most of the time push-style integrations will just be working locally, only occasionally publishing the data to Raido
Raido API availability and responsiveness is less of a concern to push-style integrators, if Raido is down or they have network issues - users can still do their stuff, just can't "publish"
- presumably, this is why people started doing push-style integrations in the first place, even though they often get stung by the lost update issue

It is preferred that RAiD does not dictate a concurrency control strategy to the raid-agency. It would be better if they can choose how they deal with concurrency control, rather than being forced to do it a specific way.

For example, a specific agency might choose not support push model integrations and could just use a simple optimistic locking approach.

Also, it's likely someone will figure out a better approach than we will come up with for Raido (given the rush for Raido implementation and adoption, and lack of resources). We can then adopt into the RAiD handbook, or even Raido.

Merging stuff to investigate

2 replies

ardc-shorn Mar 1, 2023
Author

Conversation with @ardc-shorn and @robleney-ardc - 2023-03-01.
Rob please post with corrections if this is not what your were meaning.

Proposal for a simple approach:
Add an optional list of allowedChannels, to the AccessBlock (or IdBlock, whichever).
allowedChannels set to empty implies "all channels are valid".
Example channels: [redbox, uq@rdm, raido-app] (need a controlled list vocab, I guess).

Example RedBox workflow:
When an integration uses a "push" integration approach and they mint a raid, they can set allowedChannels=redbox.
Then app-client would not be allowed to "Edit" the metadata.
We build a UI page where you can set the allowedChannel; so if someone really needs, they can change the allowed channels (might be restricted to SP_USER role or something).

ardc-shorn Mar 8, 2023
Author

@robleney-ardc ping

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raido integration - concurrency control #10

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Raido integration - concurrency control #10

ardc-shorn Dec 2, 2022

Replies: 1 comment · 2 replies

ardc-shorn Dec 2, 2022 Author

Random thoughts

ardc-shorn Mar 1, 2023 Author

ardc-shorn Mar 8, 2023 Author

ardc-shorn
Dec 2, 2022

Replies: 1 comment 2 replies

ardc-shorn
Dec 2, 2022
Author

ardc-shorn Mar 1, 2023
Author

ardc-shorn Mar 8, 2023
Author