Batch application of regular expressions to transform EAD files.
Original idea: Given a set of match and replacement patterns as a JSON file, the transformer will apply those transformations in series to the set of non-hidden files in a specified import directory.
Update: In practice, encoding of files needs to be sorted out first. Proposed strategy for encoding verification:
- open file as binary, read file and decode from UTF-8 (strict).
- if illegal characters are found:
- store contents decoded from UTF-8 as Python unicode object;
- open file, read, and decode from Windows-1252 (or Latin-1);
- compare result to UTF-8 version using difflib, and present differences to the user for verification;
- repeat as necessary until a valid decoded version is found.