Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Add tests for filename encoding inconsistencies #1

Open
1 of 2 tasks
acdha opened this issue Feb 26, 2016 · 5 comments
Open
1 of 2 tasks

⚠️ Add tests for filename encoding inconsistencies #1

acdha opened this issue Feb 26, 2016 · 5 comments

Comments

@acdha
Copy link
Member

acdha commented Feb 26, 2016

Some tests which will probably require a helper utility to create files on the local filesystem to avoid having to fight the filename normalization performed by Git, the operating system or an archival utility, which is usually a great feature but complicates reliably testing error cases:

  • Test for filenames which are Unicode equivalent but normalized differently from the manifest
  • Test for manifests listing the same filename using different normalization forms
@johnscancella
Copy link
Contributor

see v0.97/warning/filename-normalization
and v0.97/warning/same-filename-listed-twice-with-different-normalization

@acdha acdha reopened this Jan 27, 2017
@acdha
Copy link
Member Author

acdha commented Jan 27, 2017

The simple file in the repo approach will only work if you have the right combination of local filesystem and Git configuration.

@acdha
Copy link
Member Author

acdha commented Jan 27, 2017

Since we cannot rely on anything about the filename being preserved, the best way to test this would probably be to have two files and ensure that the manifest uses separate normalization forms for both so a filesystem or DVCS which normalizes names will break one of them.

@johnscancella
Copy link
Contributor

@acdha
Copy link
Member Author

acdha commented Jan 27, 2017

Both tests should be complete and standalone since they're not the same thing. That warning could be satisfied by an implementation which simply does a Unicode equivalence test on the manifest contents.

The first test isn't a warning but a compatibility check to confirm that the implementation will handle normalization differences between the manifest and filesystem so it won't choke when given a bag which has passed through a Mac, Git, some ZIP tools, etc. The easiest way to do that would be to have two separate files and list one in the manifest normalized as NFC and the other as NFD so a naive implementation will report an error if the filenames on disk are consistently normalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants