Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 808 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 808 Bytes

ead-transform

Batch application of regular expressions to transform EAD files.

Original idea: Given a set of match and replacement patterns as a JSON file, the transformer will apply those transformations in series to the set of non-hidden files in a specified import directory.

Update: In practice, encoding of files needs to be sorted out first. Proposed strategy for encoding verification:

  1. open file as binary, read file and decode from UTF-8 (strict).
  2. if illegal characters are found:
  3. store contents decoded from UTF-8 as Python unicode object;
  4. open file, read, and decode from Windows-1252 (or Latin-1);
  5. compare result to UTF-8 version using difflib, and present differences to the user for verification;
  6. repeat as necessary until a valid decoded version is found.