-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checksum mismatch vs misssing #32
Comments
To summarize: We need to review the logic around the error throwing on checksum mismatch vs checksum missing. The missing checksum should be logged as such, and should by default be generated and added to the EPrints database - that's IF the file is actually a part of the eprint and not some left-over file from a previous export. Checksum mismatch should still skip-proceed by default, but checksum missing is a log message that a missing checksum for a file was generated/added. This (on-checksum-missing) could also be controlled by a flag in the config to skip-proceed|halt|generate, with generate being the default. |
I added some code to differentiate between the two issues: checksum MISMATCH vs MISSING. |
The missing checksums is now resolved with the following commit: c754a3e The leftover to-do item is just to control what happens in case of checksum-mismatch: |
A checksum MISMATCH should only occur when there is an existing checksum in the EPrints database for a file, and it doesn't match what is being checked. In the case of a MISMATCH, what the system does should be controlled by this option. From documentation:
This option needs to be implemented, it is still not there in the code.
However, MISMATCH is not the same as a MISSING checksum in the EPrints database for a file/document. In this case, the system should do the following (from documentation):
Relevant code is here, it needs to distinguish the two cases of MISSING vs MISMATCH:
EPrintsArchivematica/lib/plugins/EPrints/Plugin/Export/Archivematica/EPrint.pm
Line 341 in 9d5c1cc
UPDATE: for files with no MD5 in the EPrints database, there is also a THIRD possibility of an error, which I did encounter: that of a pre-existing file in the "objects" directory of the export folder which doesn't belong with the EPrints that is currently being exported. That is because the current Eprint export algorithm doesn't delete the objects folder before writing to it, so a previous export's file could end up in the objects folder. In this case, the file would not have a corresponding hash in the database either. I am adding to the "no checksum in the database" error above "check that the file belongs with this eprint"
The text was updated successfully, but these errors were encountered: