-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best traineddata feedback - Fraktur #65
Comments
The new
A short (still incomplete) review of that list shows lots of issues:
2017-09-11
|
Same as in the old (and most likely new) eng.traineddata. Seems to be normal. |
In this case "normal" leads to unwanted effects. Tesseract uses those entries to decide about OCR results, and I see many of those uppercase words in my real OCR results. In most cases they are completely wrong (see for example these historic texts with If there is a need for uppercase words in some rare cases, I'd expect that those words could be generated programmatically from the normal form. I see no need to fill the word list with them. |
List of important missing characters in |
From issue #62:
The text was updated successfully, but these errors were encountered: