What files go into `Training_Data` folder, and `Validation_Data`? #2

ciobania · 2022-08-14T19:11:38Z

Hello,

Thanks for sharing this, and making it easier to work with.

What exactly goes into the .box files? Are those files generated via JTextEditor/QtTextEditor, per line, or per character?

How come we don't need truth files, like the do umentation says - albeit it has not changed in years, so it may be obsolete.

The text was updated successfully, but these errors were encountered:

RawthiL · 2022-08-15T12:43:20Z

Hello,

The .box files contain whole words, but separated in letters, something like this:

H 237 2686 593 2743 0
E 237 2686 593 2743 0
L 237 2686 593 2743 0
L 237 2686 593 2743 0
O 237 2686 593 2743 0
	 237 2686 593 2743 0
T 242 2625 735 2676 0
H 242 2625 735 2676 0
E 242 2625 735 2676 0
R 242 2625 735 2676 0
E 242 2625 735 2676 0

Those are two separate words, in two different bounding boxes (BBoxes). Each letter has its own entry, but the BBox is the same for the whole words.

I build this .box files using the annotated file from VIA software. The VIA software creates a .json file which I parsed to generate the .box files and the .tiff images.
I tried to use the JTextEditor/QtTextEditor software, but I needed some extra functionality that they did not have (nothing to do with the OCR training). I have no experience ussing them.

Regarding the truth files, I'm not sure which files you mean. I trained the Tesseract 5.x LSTM models, they only need the .box and .tiff to create the .lstm files which are used for training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What files go into `Training_Data` folder, and `Validation_Data`? #2

What files go into `Training_Data` folder, and `Validation_Data`? #2

ciobania commented Aug 14, 2022

RawthiL commented Aug 15, 2022

What files go into Training_Data folder, and Validation_Data? #2

What files go into Training_Data folder, and Validation_Data? #2

Comments

ciobania commented Aug 14, 2022

RawthiL commented Aug 15, 2022

What files go into `Training_Data` folder, and `Validation_Data`? #2

What files go into `Training_Data` folder, and `Validation_Data`? #2