Source of scripts/Fraktur etc. #39

mikegerber · 2019-06-03T11:11:57Z

While the files in the top directory seem to come from the sources in the langdata repository, the source for some of the files in scripts/ is unclear:

scripts/Fraktur.traineddata has no matching file in langdata,
scripts/Japanese.traineddata also, etc.

The Data-Files wiki article does not mention scripts/Fraktur.

This adds to the confusion of the frk language (not actually frankish, but Fraktur), the Fraktur script and the legacy model deu_frak in the tessdata repository.

The text was updated successfully, but these errors were encountered:

Shreeshrii · 2019-06-03T11:46:36Z

See

https://github.com/tesseract-ocr/tessdata_fast/blob/master/README.md

https://github.com/tesseract-ocr/langdata_lstm/blob/master/script/Fraktur.langs.txt

Shreeshrii · 2019-06-03T11:47:58Z

Also see tesseract-ocr/tessdata#65

mikegerber · 2019-06-03T13:38:08Z

Is langdata obsolete as langdata_lstm exists?

Shreeshrii · 2019-06-03T13:45:36Z

langdata files are appropriate for tesseract 3 or for legacy/base versions using tesseract 4. They can also be used for finetuning which requires a smaller input training text.

stweil · 2019-06-13T17:42:48Z

As @Shreeshrii already said, langdata_lstm is for LSTM models while langdata is for legacy models. Both kinds of models are still used.

The scriptmodels are mixtures of different languages. script/Fraktur for example combines enm+frm+frk+ita_old+spa_old.

I fixed the description for 4.00 frk in the Wiki. The other Wiki issues are still open.

Akossimon · 2019-10-01T20:46:16Z

Fraktur Tesseract OCR is what I am looking for,.... I installed VietOCR v5.5.2 and Tesseract 4.1.0 on my mac, and now I am trying to find help on how to train it better.... there are too many OCR errors...

How would I go about training the software? Can anyone help?

I am a total retard, ...sadly,.... and I do not even know how I was able to install the two components so far..... and this training step is nowhere explained

Any help into the right direction would be greatly appreciated

stweil · 2019-11-11T15:46:03Z

In the meantime newer Fraktur models are available. There is a description of the training process for those models in the Wiki.

As soon as the training is finished, I'll add the results to tessdata_contrib.

stweil · 2020-01-24T08:00:46Z

@mikegerber, can we close this issue?

stweil closed this as completed May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source of scripts/Fraktur etc. #39

Source of scripts/Fraktur etc. #39

mikegerber commented Jun 3, 2019 •

edited

Loading

Shreeshrii commented Jun 3, 2019

Shreeshrii commented Jun 3, 2019

mikegerber commented Jun 3, 2019

Shreeshrii commented Jun 3, 2019

stweil commented Jun 13, 2019

Akossimon commented Oct 1, 2019 •

edited

Loading

stweil commented Nov 11, 2019

stweil commented Jan 24, 2020

Source of scripts/Fraktur etc. #39

Source of scripts/Fraktur etc. #39

Comments

mikegerber commented Jun 3, 2019 • edited Loading

Shreeshrii commented Jun 3, 2019

Shreeshrii commented Jun 3, 2019

mikegerber commented Jun 3, 2019

Shreeshrii commented Jun 3, 2019

stweil commented Jun 13, 2019

Akossimon commented Oct 1, 2019 • edited Loading

stweil commented Nov 11, 2019

stweil commented Jan 24, 2020

mikegerber commented Jun 3, 2019 •

edited

Loading

Akossimon commented Oct 1, 2019 •

edited

Loading