-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the fonts fas traineddata #10
Comments
It has been a while since I ran that training and I don't have the files saved. Going by the commits in the git repo - ie. I think it was based on finetuning (for impact) the tessdata_best/script/Arabic model. I had added Arabic comma and other punctuation to the training_text and not included the English letters [a-zA-Z] in the unicharset. The font used was most probably |
Please see tesseract-ocr/tessdata#70 Possibly I used the fonts recommened on that page - Roya, Nazanin etc. |
Thanks Shreeshrii Sorry for the inconvenience |
fas-script-float is for Persian/Farsi. The numerals for Farsi and Arabic are different. Regarding ocrd-train, I only have a fork of the project, with a suggested change to makefile to use 'wordstrbox' option for creating box files for complex scripts. However, I have not personally tried it for Arabic, as I do not know the language/script and so it is difficult for me to ascertain that it us working correctly. |
Thanks for uploading this trained model - could you possibly provide some info about the training data?
Specifically the fonts used and the text used for fas-script-float
Thanks!
The text was updated successfully, but these errors were encountered: