Raido integration - concurrency control #10
Replies: 1 comment 2 replies
-
Random thoughtsAuckland Uni talked about marking certain fields of a raid as being "locked for update by the integration"
Even with something like channel locking, still need to implement some kind of MVCC style concurrency control for app-client (and any near real-time "pull" model integrations). Or we could just say that Raido doesn't support the push model? Push model brings some nice benefits though:
It is preferred that RAiD does not dictate a concurrency control strategy to the raid-agency. It would be better if they can choose how they deal with concurrency control, rather than being forced to do it a specific way. For example, a specific agency might choose not support push model integrations and could just use a simple optimistic locking approach. Also, it's likely someone will figure out a better approach than we will come up with for Raido (given the rush for Raido implementation and adoption, and lack of resources). We can then adopt into the RAiD handbook, or even Raido. Merging stuff to investigate |
Beta Was this translation helpful? Give feedback.
-
Note: I'm framing this discussion in terms of Raido, but the topic is a concern for RAiD as well.
After talking to Auckland Uni and Redbox integration teams, it seems some folks are wanting to integrate via a "push" model - where their local system is the "source of truth" for their data and they occasionally "push" or "publish" their data to Raido.
Up till now, I'd been assuming integration would follow a "pull" model - where Raido was considered the central "source of truth" and systems would integrate by always calling Raido APIs in near real-time.
This leads to a problem both teams have already encountered in their integrations with other systems - the dreaded lost update.
Note that concurrency control is an issue that Raido needs to address regardless of folks integrating via a "push" or "pull" model.
Even if an institution were using only the Raido app-client (i.e. our web UI), lost updates can still happen if concurrency control is not addressed. Right now, using the current Raido app-client, you can observe the lost update issue by using two separate browsers and carefully timing your read and update actions (note that this is a little bit tricky to actually reproduce, because the app-client uses window focus re-fetching).
The original plan to deal with concurrency control was to either use a simple optimistic locking approach or a full-fledged MVCC scheme. Likely via extra fields in the
IdBlock
.The plan was to implement a simple "first update wins" strategy and the second updater (either via API or app-client) would see an error and would have to re-try.
But a simple implementation like that is going to cause problems with a "push" integration model. The "push" model makes lost updates likely if people are using both their own "delayed push" API integration and doing real-time updates to the same raid. And much worse, it results in temporal displacement - the conflict is only know about at the time the user tries to publish their raid data. When concurrency conflicts are detected quickly, they're usually easy to resolve. When the updates that cause conflicts are potentially separated by weeks - it amplifies things and makes it much more difficult to resolve.
Unless considered carefully - a simple locking schema could easily result in a situation where a raid gets "stuck": the pushing system will no longer be able to publish to raid because their version is stale. They would need to refresh the raid and somehow and then merge the updates made (some time ago) in their local system with the updates made in Raido. Or Raido will have to implement merging logic... somehow.
Any data merging based approach will be tricky to implement. We could probably reduce the likelihood of concurrency conflicts by extending the MVCC concept down to "block" level, but that opens up its own can of worms (reconciling the concept of "raid version" with "block version", etc.) and doesn't really solve the root problem - it just pushes the concurrency issues down to block level instead of raid level.
Beta Was this translation helpful? Give feedback.
All reactions