Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-device / sync support #791

Open
TheLastProject opened this issue Feb 5, 2022 · 17 comments
Open

Multi-device / sync support #791

TheLastProject opened this issue Feb 5, 2022 · 17 comments
Labels
common: occasional Affects or can be seen by some users regularly or most users rarely severity: major Severely degrades major functionality or product features, with no satisfactory workaround type: enhancement New feature or request

Comments

@TheLastProject
Copy link
Member

We need to build and integrate Catima Sync.

I am closing all other issues related to multi-device or syncing support for the following reasons:

  1. To keep a clear overview
  2. Because Google is increasing limitations on filesystem access from apps, so local options with things like Syncthing and Nextcloud are less and less useful for most users
  3. WebDAV does not have sufficient conflict resolution for Catima's file format and it is important conflicts can be solved in-app easily.
@TheLastProject TheLastProject added type: enhancement New feature or request severity: major Severely degrades major functionality or product features, with no satisfactory workaround common: occasional Affects or can be seen by some users regularly or most users rarely labels Feb 5, 2022
@IllusiveMan196
Copy link
Contributor

Syncthing can't copy files in Android/data and Android/obb. The rest seems to be accessible. Even then it is not or rather should not be an issue on Android versions lower than 11. If Catima saves things into common directory or outside Android folder (as it should and does for me) this isn't really an issue. But if we can get data off the phone somehow I'm not against it as I don't really like indirect vendor lock-ins. Wanna have your data - use our services" style.

@tobast
Copy link

tobast commented Aug 30, 2022

Hello,
Has any work started on this? I can see a bit of discussion in the issues of the Sync repo, but no code at all. If I can find time, I might be able to work on this.

@TheLastProject
Copy link
Member Author

No work has started on this, I haven't gotten to it and I don't think I will get to this any time soon. This is also quite low on the priority list for myself as I only have one phone.

I basically have the following concerns:

  1. Internet permission: as most users don't need this feature, giving every single Catima install internet access for this feels not a good start for privacy
  2. Hosting: I'm already paying money for the Catima project, and that's fine for a hobby project (not everything needs to make money), but I'm not looking forward to the disaster of dealing with user data without getting paid. So we need a self-hostable solution (also for privacy concerns). I could possibly set up a paid host under my control later if things seem to work fine, but I want to make sure it's easily self-hostable first (privacy-friendly option first, "mass consumer" later).

For WearOS, I have a similar "tainting" issue as number 1 and my plan is to split the syncing into a separate app and communicate with the main Catima app using Android intents (see #25 (comment)). I haven't started on this yet so I'm not sure how viable it is, but a secondary sync app which uses LIST_CARDS, READ_CARD, EDIT_CARD, CREATE_CARD permissions to read and update the cards Catima has seems like a very clean way to go about this. See also https://developer.android.com/guide/topics/permissions/defining.

A secondary concern is that card IDs are currently sequential, so adding from a second device if the first device has been offline for a while could cause conflicts: CatimaLoyalty/Sync#8.

For the Sync repo I was thinking of using something like Django because I personally have experience with it and it has a nice built-in admin control panel and other convenience features, but I don't have very strong opinions for this. The landscape might have moved on and given nice new options since I last built a web app myself.

So basically, this is a very huge task, and help is definitely very welcome! Do you have a plan on how you want to tackle this and where you want to start? I'm thinking a fake app that just reads/edits/injects cards offline at first to communicate with Catima and deal with conflict resolution might be the cleanest start, but I'm quite interested in what you think too.

@tobast
Copy link

tobast commented Aug 30, 2022

I also only have one phone, but I see this as a great way to have backups, and maybe even enable cards sharing between multiple users of the same server.

My experience with phone apps development is extremely limited, and last time I used Java was probably around 10 years ago, but I'm experienced with Django and thought I could mostly help there, especially as the expected endpoints are already documented in the issues there.

I completely agree on the self-hosting requirement; actually, I wouldn't be interested in this myself if it was not meant to be self-hostable :) If, in the future, there are plans to host this on an "official" server, some serious privacy concerns would be involved, and I expect this would require some form of encryption. For now, I was considering only the self-hosted model, where the server can usually be considered trustworthy.

As for the ID issue, is the ID actually used extensively? It feels to me like this should not be a problem, and using either a (randomly-generated) uuid4 or, say, some hash (only to map it to an integer) of the card data instead would be enough. I feel like in most cases, the card data is a good identifier, but this would mean that changing the card data (eg if it expires yearly and the user rescans it) would delete and recreate the card behind the scenes, instead of changing the data.

If the goal is to minimize the changes required on the mobile app, I even believe that matching a card by (a hash of) its data could be fine, and different phones could have different IDs in their local DB for the same card. This looks messy and could create weird issues in the future, though.

If it seems good to you, my plan would be to start with a Django app without any interface apart from the django-admin, with :

  • models
  • authentication (I believe tokens would be more fitting, to be able to revoke a single phone if eg. it gets stolen)
  • API endpoints

The conflicts handling looks like the biggest issue, but it seems to me that most of the testing can be done in unit tests instead of creating a mock client -- this would come with the benefits of specifying clearly where user intervention is needed and where the merge can be automatic.

@TheLastProject
Copy link
Member Author

I'm experienced with Django and thought I could mostly help there, especially as the expected endpoints are already documented in the issues there

Help with the Django part sounds great. Please do note that the endpoints I made issues for were my ideas based on (most likely outdated) best practice knowledge. If you feel I'm wrong anywhere, feel free to suggest improvements :)

As for the ID issue, is the ID actually used extensively

The ID is the single most important element. It's the "primary key" in the SQLite database on the Android app. It is used when opening cards from the main screen, when referring to which cards are in a group, when setting and opening a shortcut widget, etc. The ID should never change. However, currently the ID is a numeric autoincrement. I think it may be better to change it to an UNIX timestamp so that the only reason 2 newly created cards on different devices that are both offline could cause a conflict would be if they were created at the exact same second.

If it seems good to you, my plan would be to start with a Django app without any interface apart from the django-admin, with :

  • models,
  • authentication (I believe tokens would be more fitting, to be able to revoke a single phone if eg. it gets stolen)
  • API endpoints

That sounds great :)

The conflicts handling looks like the biggest issue, but it seems to me that most of the testing can be done in unit tests instead of creating a mock client -- this would come with the benefits of specifying clearly where user intervention is needed and where the merge can be automatic.

Having a way of dealing with this will be a requirement before going live but probably doesn't need to be part of an initial prototype. With unit tests, I suppose you mean on the client side? I wonder how to best detect conflicts. My naive method would be to send the old data together with the new data to see if a change was made since starting making this change, but I expect there to be much better solutions.

@tobast
Copy link

tobast commented Aug 30, 2022

The ID is the single most important element. It's the "primary key" in the SQLite database on the Android app. It is used when opening cards from the main screen, when referring to which cards are in a group, when setting and opening a shortcut widget, etc. The ID should never change. However, currently the ID is a numeric autoincrement. I think it may be better to change it to an UNIX timestamp so that the only reason 2 newly created cards on different devices that are both offline could cause a conflict would be if they were created at the exact same second.

I think I still prefer the idea of a uuid4. It seems to be exactly the kind of job it was designed to handle, and it also seems to me that it is the most common solution for this kind of problem — distributed IDs. I believe databases handle those seamlessly, too — postgresql has a type for it, SQLite would handle those as BLOBs.

As I see it,

  • Pros:
    • Unique for sure — even though creating multiple cards in the same second would never happen on the app, it would be possible if someone batch-creates those directly in the Django app or whatever. Not a big concern, but it might avoid weird issues later.
    • Quite standard, so it might be a bit easier in libs and a bit easier to understand for future devs
  • Cons:
    • Doesn't carry information (the creation timestamp is interesting in itself, a uuid is useless data)
    • A bit larger (not that anyone cares about 64 bits)

Did you have anything else in mind? What would be your preferred solution?

Having a way of dealing with this will be a requirement before going live but probably doesn't need to be part of an initial prototype. With unit tests, I suppose you mean on the client side? I wonder how to best detect conflicts. My naive method would be to send the old data together with the new data to see if a change was made since starting making this change, but I expect there to be much better solutions.

Yes, that would be client side, I believe. I was wondering whether there is a need to handle merging both on the client and server side, but it seems to me that the good solution to avoid this is to behave somehow like Git, ensuring that the user has pulled the latest version of the database before they try to push any data — otherwise it might just erase a modification that happened in between without knowing.

A way to do this might be to carry an incremental revision id for each card. When updating the card on the server, the client sends along the revision ID of the card before modification, which should be the revision the server is aware of. If not, the update fails, forcing the client to fetch the latest version before. This way, a client can also easily check if it has the latest version of a given card.

@TheLastProject
Copy link
Member Author

I think I still prefer the idea of a uuid4. It seems to be exactly the kind of job it was designed to handle, and it also seems to me that it is the most common solution for this kind of problem — distributed IDs.

You raise good points in favour of UUIDs. Sadly, Android itself doesn't seem to handle those well.

Looking at https://stackoverflow.com/a/14184946/8378787, it seems like Android wants the primary key to be an integer to be able to use a Cursor. And a Cursor is the preferred way of loading data from the database in Android as far as I can tell (and used all throughout Catima, it would be an enormous refactor). So it seems like we sadly need to stick to an integer format for this. I think, however, this is not something that is required to be fixed before working on the server part. Throwing a CONFLICT for now when 2 different devices have been offline and both create a card is reasonable until the ID generation in the Android app gets improved.

A way to do this might be to carry an incremental revision id for each card. When updating the card on the server, the client sends along the revision ID of the card before modification, which should be the revision the server is aware of. If not, the update fails, forcing the client to fetch the latest version before. This way, a client can also easily check if it has the latest version of a given card.

That sounds like a simple and logical way to deal with it, yeah. I even see this work if both devices were offline if the server was just configured to only deal with one change per card at a time.

@tobast
Copy link

tobast commented Aug 30, 2022

Looking at https://stackoverflow.com/a/14184946/8378787, it seems like Android wants the primary key to be an integer to be able to use a Cursor. And a Cursor is the preferred way of loading data from the database in Android as far as I can tell (and used all throughout Catima, it would be an enormous refactor). So it seems like we sadly need to stick to an integer format for this. I think, however, this is not something that is required to be fixed before working on the server part. Throwing a CONFLICT for now when 2 different devices have been offline and both create a card is reasonable until the ID generation in the Android app gets improved.

The documentation is actually extremely vague on the requirements for this _ID column, and the stackoverflow post only points out that if you store your uuid as a string (instead of a 16B blob), it will break CursorAdapter. I went through the source code to convince myself, and indeed, a field that can be used as a long is required, as seen here in the Android source code (mRowIdColumn is the id of the column _ID in the schema). I believe CursorAdapter could be inherited to override getItemId, but this starts to get seriously complicated for small benefits.

I think that the UNIX timestamp is a good way to go, then!

@tobast
Copy link

tobast commented Aug 31, 2022

Where do you prefer me to start the development? I cannot fork CatimaSync because the repository is empty.

@TheLastProject
Copy link
Member Author

Oops, I didn't notice forking an empty repo isn't possible. I've created a simple README file now so you can fork the repo :)

@tobast
Copy link

tobast commented Sep 1, 2022

I'm having second thoughts on the ID being a UNIX timestamp. Sure, there is a very small chance of a user creating two cards in the same second (at least manually); but if the ID is supposed to be unique also on the sync server, it means that there will be a conflict if two users create a card in the same second. This might start to look plausible if there is a big sync server with many users at some point. Even worse: there will be no conflict on the phone directly, but there will be a conflict when trying, later, to push the card to the server.

I doubt that there will ever be a server with one card creation per second on average, so sure, this would work:

while True:
    try:
        sync.push(card)
        break
    except IdNotUnique:
        card.id += 1

but it starts to look very inelegant, and if this happens 1h after the card was created (eg. no internet), this means that the phone app cannot hold for true that a card ID never changes (I don't know if this is something expected, but it sounds like something that would be implicitly assumed).

This, plus the fact that it might be necessary to migrate all sequential IDs to globally-unique IDs at some point when upgrading the app, makes me reconsider. I actually think that the best option here to be sure not to regret later is to keep sequential IDs on the phone, and add a new UUID column on the phone that is assumed to be globally-unique. The sequential ID remains only on the phone and is never sent to the server, and the UUID (DB index) is used whenever synchronizing. As a side effect, two phones synchronized to the same account will certainly have different sequential IDs for the same card, but I guess that's not a problem.

Does this look sound to you?

@TheLastProject
Copy link
Member Author

but if the ID is supposed to be unique also on the sync server

I don't think it needs to be. The ID is unique local on the device but I think on the server side it would be just as sensible to make the ID on the server be the account ID + card ID (so, a Composite Primary Key).

this means that the phone app cannot hold for true that a card ID never changes (I don't know if this is something expected, but it sounds like something that would be implicitly assumed).

If the card ID ever changes on the phone, it means all shortcut widgets are broken. Those could perhaps be fixable, but it would still add a lot of complexity I want to avoid. The ID should be considered unmutable on the client side.

the fact that it might be necessary to migrate all sequential IDs to globally-unique IDs at some point when upgrading the app

I don't understand why this is a "fact". I want to prevent migration ever being a thing because widgets. So it seems more likely to be a fact that the ID will never change on the device.

keep sequential IDs on the phone, and add a new UUID column on the phone that is assumed to be globally-unique. The sequential ID remains only on the phone and is never sent to the server, and the UUID (DB index) is used whenever synchronizing. As a side effect, two phones synchronized to the same account will certainly have different sequential IDs for the same card, but I guess that's not a problem.

That seems reasonable. As an added benefit, it would in the future make it easier to implement a card being shared by multiple accounts and update on all accounts it's added on.

Does this look sound to you?

It sounds okay. I suppose when a card doesn't have an UUID, it's considered new and the server will return the UUID the client should assign it locally?

There is one case I do worry about and that is users who will have been keeping their backup in sync between multiple devices manually and then start using this. But I guess your system will create duplicates which to me seems like the least problematic way to resolve such a conflict (as there's no data loss, only data duplication).

So, yeah, a new UUID column in the client and making the server talk using UUIDs seems like the cleanest possible solution. It also gives us added flexibility for the future. One thing I do find important is that we plan the format really well. Android uses SQLite, the server may use... not sure what, you named PostgreSQL before, that seems fine to me. If we're sure SQLite BLOB and PostgreSQL uuid are completely compatible... yeah, great :)

@tobast
Copy link

tobast commented Sep 2, 2022

It sounds okay. I suppose when a card doesn't have an UUID, it's considered new and the server will return the UUID the client should assign it locally?

I was actually considering generating the UUID on client-side at card creation (or in the DB migration for already existing cards), but both options are looking fine to me. I think that we need to assume anyway that there will never be a global UUID collision globally: if not, it would be necessary to handle the case where a user migrates from one server to another, and suddenly there is a UUID clash. Assuming this, the server is in no better position to generate the UUID anyway.

From a pragmatic point of view, I don't believe that there is any added complexity in generating the UUID on the server, and in the extremely unlikely case where there is a clash, it is easier to handle, so yeah, sounds good.

There is one case I do worry about and that is users who will have been keeping their backup in sync between multiple devices manually and then start using this. But I guess your system will create duplicates which to me seems like the least problematic way to resolve such a conflict (as there's no data loss, only data duplication).

I see multiple ways to deal with this:

  • duplicate approach: what you describe (sounds good, duplicates entries but whatever);
  • manual approach: tell those users to upgrade, sync to a server on one device, sync their DBs manually once more, then authenticate with a unique token on each device. This way the cards are shared and everything looks cleaner.
  • automatic approach: on the server, detect at card creation time that a card is an exact copy of one we already have, by the same user account. When assigning a new UUID to it, actually make it an alias and return the pre-existing UUID.

The last solution looks like the cleanest to me, but would break if a user somehow wants to strictly duplicate a card (up to its last-used date, which I don't believe is possible anyway using the app in legit ways).

So, yeah, a new UUID column in the client and making the server talk using UUIDs seems like the cleanest possible solution. It also gives us added flexibility for the future. One thing I do find important is that we plan the format really well. Android uses SQLite, the server may use... not sure what, you named PostgreSQL before, that seems fine to me. If we're sure SQLite BLOB and PostgreSQL uuid are completely compatible... yeah, great :)

I don't believe that this is actually necessary: I'd rather make sure the API calls and arguments are fool-proof, and converted on the device/server. It would be possible to transmit the binary blob corresponding to the UUID, but yeah, things could happen, endianness could be wrong at some point, we might not catch the bug, … I believe it's just safer to read the UUID blob from DB on the device and load it as a Java UUID on the phone, and send it over as a well-defined string-formatted UUID in the API (and conversely on the Python side). This way, all the storage backend is abstracted, and nothing weird should happen — and the overhead should be negligible.

@TheLastProject
Copy link
Member Author

automatic approach: on the server, detect at card creation time that a card is an exact copy of one we already have, by the same user account. When assigning a new UUID to it, actually make it an alias and return the pre-existing UUID.
The last solution looks like the cleanest to me, but would break if a user somehow wants to strictly duplicate a card (up to its last-used date, which I don't believe is possible anyway using the app in legit ways)

I think that's actually exactly what the duplicate feature does as it just copies the old card. But maybe a new duplicate should always have lastUsed set to the current time. It kinda sounds like a bug that a duplicate would have an used time equal to the card it's duplicated from. I've created #1006 for it.

I think the automatic approach would be better for most users, but it still worries me to have "magic" like this. I think we should at least give a reason in the creation endpoint if we go this route so the client can know this happened. Something like:

{"uuid": "abcdef", "reason": "alreadyExists"}

and send it over as a well-defined string-formatted UUID in the API (and conversely on the Python side)

Ah yeah, that's what I meant. I more meant that we need to be careful about the formats. But a SQLite BLOB should store any value so that should be fine (although I'm not completely sure why it'd need a blob on the SQLite Android app side instead of a string, I'm not sure how to store a blob in a .csv output file anyway but a string will work fine).

@tobast
Copy link

tobast commented Sep 3, 2022

I think the automatic approach would be better for most users, but it still worries me to have "magic" like this. I think we should at least give a reason in the creation endpoint if we go this route so the client can know this happened. Something like:

{"uuid": "abcdef", "reason": "alreadyExists"}

Looks fair.

Ah yeah, that's what I meant. I more meant that we need to be careful about the formats. But a SQLite BLOB should store any value so that should be fine (although I'm not completely sure why it'd need a blob on the SQLite Android app side instead of a string, I'm not sure how to store a blob in a .csv output file anyway but a string will work fine).

I was still thinking about BLOBs as I once thought this could be an ID, but maybe you're right and a string is good enough. A BLOB is less bytes in DB (maybe faster search, then? I'd need to benchmark it to check this), but arguably it's negligible; on the other side, load to/from strings is trivial from both Java and Python built-in UUID class, although I expect BLOB to be not much harder.

@obfusk obfusk mentioned this issue Oct 15, 2023
49 tasks
@m0byn
Copy link

m0byn commented Jan 11, 2024

Risking the fact that this might be a more technical discussion and that this post is not as sophisticated as others I want to support this issue highly and highlight the usefulness and increased UX if it is implemented. Please implement it, if possible!

Other than that the app is amazing - keep up the good work!

@Korb
Copy link

Korb commented Jun 28, 2024

How is this issue fundamentally different from “Simultaneous usage on two or more devices: thoughts” (#20)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common: occasional Affects or can be seen by some users regularly or most users rarely severity: major Severely degrades major functionality or product features, with no satisfactory workaround type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants