the fonts fas traineddata #10

TAQBIBT · 2019-07-21T20:26:57Z

Thanks for uploading this trained model - could you possibly provide some info about the training data?

Specifically the fonts used and the text used for fas-script-float

Thanks!

Shreeshrii · 2019-07-22T05:00:28Z

It has been a while since I ran that training and I don't have the files saved.

Going by the commits in the git repo - ie.

67b9593

c50e3a3

4e706d1

I think it was based on finetuning (for impact) the tessdata_best/script/Arabic model. I had added Arabic comma and other punctuation to the training_text and not included the English letters [a-zA-Z] in the unicharset. The font used was most probably Arial Unicode MS.

Shreeshrii · 2019-07-22T09:12:46Z

Please see tesseract-ocr/tessdata#70

Possibly I used the fonts recommened on that page - Roya, Nazanin etc.

anergui · 2019-07-22T15:01:52Z

Thanks Shreeshrii
please I can not train the Arabic language with OCRD-train that you have proposed on this link: https://github.com/Shreeshrii/ocrd-train
are tiff and gt.txt files prepared like LTR languages or not?
can i start with traineddata that you have proposed example fas-script-float?

Sorry for the inconvenience

Shreeshrii · 2019-07-26T12:19:08Z

fas-script-float is for Persian/Farsi. The numerals for Farsi and Arabic are different.
But it is a float model, similar to the tessdata_best and can be used as base for further training.

Regarding ocrd-train, I only have a fork of the project, with a suggested change to makefile to use 'wordstrbox' option for creating box files for complex scripts.

However, I have not personally tried it for Arabic, as I do not know the language/script and so it is difficult for me to ascertain that it us working correctly.

arrrrny mentioned this issue Oct 17, 2019

None of the traineddata works for me #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the fonts fas traineddata #10

the fonts fas traineddata #10

TAQBIBT commented Jul 21, 2019

Shreeshrii commented Jul 22, 2019

Shreeshrii commented Jul 22, 2019

anergui commented Jul 22, 2019

Shreeshrii commented Jul 26, 2019

the fonts fas traineddata #10

the fonts fas traineddata #10

Comments

TAQBIBT commented Jul 21, 2019

Shreeshrii commented Jul 22, 2019

Shreeshrii commented Jul 22, 2019

anergui commented Jul 22, 2019

Shreeshrii commented Jul 26, 2019