calamari-predict truncates filename #308

stefanCCS · 2022-03-01T15:58:14Z

If the image to be "OCRed" has more than one '.' in filename, some parts of the resulting filename are truncated.
E.g.:
something.else.png --> something.pred.txt instead of something.else.pred.txt

andbue · 2022-03-01T18:18:22Z

Right, that's a little bit annoying, I've struggled with that myself before. In ocropus, the image file names contain information on preprocessing (e.g. 001.bin.png) that have to be ignored. If we change the current behaviour, we might brake support for legacy datasets. I don't know if ocr4all needs this - @chreul ?
Maybe we could either implement a command line switch to toggle file extension handling or just ignore a specific set of strings (bin, raw, nrm, maybe col?).

maxnth · 2022-03-01T19:00:32Z

I don't know if ocr4all needs this

OCR4all currently indeed needs this but we could just use a small wrapper / postprocessing script for this (and the newly written back end manages files different anyways) so changing this wouldn't really be a problem for OCR4all.

stefanCCS · 2022-03-02T09:13:43Z

Well, in my opinion the current behaviour is unexpected for newcomers like myself.
I (and I assume any other newcomer) like the idea to change this - any additional command line switch would be ok, of course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calamari-predict truncates filename #308

calamari-predict truncates filename #308

stefanCCS commented Mar 1, 2022

andbue commented Mar 1, 2022

maxnth commented Mar 1, 2022

stefanCCS commented Mar 2, 2022

calamari-predict truncates filename #308

calamari-predict truncates filename #308

Comments

stefanCCS commented Mar 1, 2022

andbue commented Mar 1, 2022

maxnth commented Mar 1, 2022

stefanCCS commented Mar 2, 2022