Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 557 Bytes

README.md

File metadata and controls

23 lines (12 loc) · 557 Bytes

alto-tools

Python3 script for performing various operations on ALTO files.

Usage

  • extract UTF-8 text content from ALTO file

    python3 alto_tools.py alto.xml -t

  • extract page OCR confidence score from ALTO file

    python3 alto_tools.py alto.xml -c

  • extract bounding boxes of illustrations from ALTO file

    python3 alto_tools.py alto.xml -l

Planned

  • write output to file(s) - currently all output is sent to stdout

    python3 alto-tools.py alto.xml [OPTION] -o