-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge imports #1394
base: main
Are you sure you want to change the base?
merge imports #1394
Conversation
After this change, the only errors ignored during import should be duplicate groups (which is fine), unless the data being imported does not conform to the expected format somehow, which should not happen under any normal circumstances. But a lot of the code still doesn't actually check for errors. |
It looks like the parser is handling that already. So I can't guarantee any overlooked edge cases, but I'm not seeing any potential errors being overlooked now, except for duplicate groups and group mappings, which are perfectly fine to ignore, or something being wrong with the database itself. But I don't expect the parser to produce invalid/conflicting data, and I expect valid data to not result in errors. I still think error checking is a good idea anyway. But it doesn't look like a high priority after this. |
There could be an NPE from invalid data. But that should be caught by the importer and cause the import to fail. |
I'm wondering, wouldn't it be cleaner to change how the As I understand your change this doesn't "merge" imports, it just assumes that anything in the import doesn't exist yet so it switches the brokenness from silently dropping cards to duplicating cards which I agree is an improvement but still not the desirable status. |
See also #513 and especially #512 (comment) for what feels like a better flow. And CatimaLoyalty/Docs#4 (comment). |
We would still need to deal with existing exported data using old IDs though. So that would add complexity. And might still not produce the desired result if you e.g. add the same card to two devices and then want to import one into the other. This would pretty much only help avoid duplicating the exact same card added on the same device. And then you could still have two different versions that were modified independently on different devices or when merging exports from different times.
Fair enough. The point here is to make sure nothing gets overwritten/broken. Having to delete a few duplicates manually is much better than the current situation. I was focused on that, and not so much on dealing with duplicates. Adding a conflict dialog like Nextcloud seems like quite a bit of work, and since duplicate file/card names are not an issue here, I'm not sure it makes sense to do the same. If you want to go that route I'd suggest merging this for now and making a separate PR for the additional work. It would be relatively easy to add something like this though:
I can add that to this PR if you'd like. |
Also note that only catima exports contain IDs. If we want "intelligent merging", it seems good to apply that to all other import formats as well, not just catima. For consistency. |
A fourth option would be to prefer latest |
Or we could merge this as-is and I'll make a follow-up PR for that. Let me know what you prefer. |
actually, I think this would not be too hard after all:
I think I'll implement that in what do you think? |
The main issue is that the current import code needs a complete rewrite anyway (see #513). Because it'll require moving the whole thing into a background thread and requires spawning a notification, using a notification on conflicts and offering options seemed easier but I guess it probably is still challenging. My main concern is complicating the current importing code and making moving away from the current deprecated and crashy method more work. My secondary concern is switching to a new method that won't work well for multi-device sync and I think randomly increasing the IDs if the ID is already in use will be one of those things where it'll severely complicate multi-device sync and risk the possibility of getting stuck in loops where IDs keep getting incremented each sync because they keep conflicting. Aside from that, messing with the IDs seems very hacky, like the kind of think that will bite us in the future. You do raise a good point with that other importers don't share Catima's ID system and the only combination of fields we can reasonably assume to always exist is name, type and value and perhaps merging based on those values makes most sense. It would be nice to detect duplicates based on those fields but without any way to notify users a duplicate was found that is currently useless. So, in my mind, the cleanest solution is:
I don't really think writing merging code while the importing code is in such a broken state and we can't reliably notify users of conflicts makes a lot of sense as it'll just translate into more code to move to the new system and more things to test before the new system can be considered stable. |
That wasn't something I was considering in this context. But good point. We should address that.
#1389 does that. So feel free to merge that as-is (or request changes). Unfortunately, if the import fails the DB will be unchanged but images may have been overwritten. Preventing that would be quite hacky with the current code, even more so than this PR which at least should keep both old and new data completely intact.
That does seem like a good idea. Sadly, I have no experience with that so it'll take more time to figure out. But I'd like to help with that. Any pointers on how to get started with that?
My proposal above is an attempt to formulate such a merging strategy. It could almost certainly use some tweaking but the basic concept seems sound to me personally, though I could easily have missed a use case I don't have myself that it doesn't cover. Feedback on that idea so we can figure out the best merge strategy before implementing it would be appreciated. I would then like to implement that in And yes, if we're going to implement a UI for conflict resolution, we should probably do it as part of the rewrite and not based on the current implementation. |
I may be missing something but I'd expect this to be fairly simple to solve?
But maybe it's better to just do nothing until we have a good fix because any workaround we do not we'll just have to revert again in the future.
At #622 (comment) I talked about using a Service but I might be wrong there too. I don't know what the recommended Android method is for long-time background processing (maybe WorkManager?)
I see one big issue with this: asking the user what they want. You can't use that for sync constantly, you'd have to bother users every sync. I'm thinking we should just change the ID to be the current UNIX timestamp on creation and consider any difference in field a conflict and offer the option to using 1, 2 or keep both (with keeping both we can insert the new card with the current timestamp). That way, the only conflicts anyone could possibly get would be multiple users saving a card on the exact same second which seems unlikely. |
In theory, yes. But currently, parsing the data and updating the database is interleaved with saving the new images. So existing images may have been overwritten by the time the database transaction is aborted. We could e.g. save new images with a But to me this PR seemed the better interim solution as it guarantees that no data is lost or overwritten. Though of course it may be duplicated, which is clearly a concern for you. Just let me know how you want to proceed :) |
I agree it makes sense to use timestamps to avoid duplicate IDs. That does indeed make syncing and importing easier since we rarely need to modify the ID. It does not however solve the problem with existing IDs (and imports from formats without IDs). As I see it, that leaves us with: 1. Same ID, 100% identical data:Trivial :) Though what about e.g. two identical cards with different images? 2. Same ID, different data (e.g. modified independently, could be a completely different name or barcode or just a modified note or colour):
Agreed. But what will we do next sync? Also: last used timestamp will likely differ, so maybe we should ignore that one field (not sure about header colour and star status). 3. Different ID, 100% identical data (e.g. adding the exact same card on two devices independently):We need to choose which ID to use, but that should not be too hard. Also: last used timestamp will likely differ, leading us to 4 unless we ignore that. 4. Different ID, not 100% identical data:This is the hard one. I think if e.g. the name and barcode (I suggested letting the user choose which fields to use for that, but that seems like a good default) are identical we should consider this a conflict as well and handle it the same way as 2. Also: this could happen to more than one pair of cards at a time. And what will we do next sync? If we keep both, there should not be a conflict as we have an identical card with the same ID. Also: how is sync intended to work? Unidirectional? Bidirectional? Will it remember how conflicts were handled before? |
Replaces #1389.
This renumbers IDs -- both from the CSV and in the image file names -- that are present in the imported data, to make sure they do not conflict with the existing data.
It also removes the newly added image files if the import fails, which is now safe since there are no conflicts with existing IDs.
Only the Catima importer has to deal with IDs being present in the imported data; all other importers create new IDs, so they are not affected.
I'm not really handling duplicate group names, but errors from
DBHelper.insertGroup()
are ignored and I think it makes sense to just effectively merge groups with identical names anyway; we do need to deal with that when we have proper group IDs.