Skip to content

Commit

Permalink
Fix extra intra-word spacing in Chinese and Japanese (GitHub issue #991)
Browse files Browse the repository at this point in the history
Add `preserve_interword_spaces 1` to the *_vert.traineddata.

It can be removed now from traineddata which loads those files
as a sublanguage.

Signed-off-by: Stefan Weil <[email protected]>
  • Loading branch information
stweil committed May 21, 2019
1 parent 1f2cb09 commit 0309ca8
Show file tree
Hide file tree
Showing 6 changed files with 10 additions and 9 deletions.
4 changes: 1 addition & 3 deletions chi_sim/chi_sim.config
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
#Fixes https://github.com/tesseract-ocr/tesseract/issues/991
preserve_interword_spaces 1

tessedit_load_sublangs chi_sim_vert

# Important configurations for CJK mode

# New Segmentation search params
Expand Down
3 changes: 3 additions & 0 deletions chi_sim_vert/chi_sim_vert.config
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Important configurations for CJK mode

# Fix https://github.com/tesseract-ocr/tesseract/issues/991
preserve_interword_spaces 1

# New Segmentation search params
language_model_ngram_on 1
segsearch_max_char_wh_ratio 1.3
Expand Down
3 changes: 0 additions & 3 deletions chi_tra/chi_tra.config
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
tessedit_load_sublangs chi_tra_vert

# Fix https://github.com/tesseract-ocr/tesseract/issues/991
preserve_interword_spaces 1

# Important configurations for CJK mode

# New Segmentation search params
Expand Down
3 changes: 3 additions & 0 deletions chi_tra_vert/chi_tra_vert.config
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Important configurations for CJK mode

# Fix https://github.com/tesseract-ocr/tesseract/issues/991
preserve_interword_spaces 1

# New Segmentation search params
language_model_ngram_on 1
segsearch_max_char_wh_ratio 1.3
Expand Down
3 changes: 0 additions & 3 deletions jpn/jpn.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
#Fixes https://github.com/tesseract-ocr/tesseract/issues/988
preserve_interword_spaces 1

tessedit_load_sublangs jpn_vert
# Important configurations for CJK mode

Expand Down
3 changes: 3 additions & 0 deletions jpn_vert/jpn_vert.config
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Important configurations for CJK mode

# Fix https://github.com/tesseract-ocr/tesseract/issues/991
preserve_interword_spaces 1

# New Segmentation search params
language_model_ngram_on 1
segsearch_max_char_wh_ratio 1.3
Expand Down

1 comment on commit 0309ca8

@YangtseSu
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I download the compiled testsdata after this commit being merged?

Please sign in to comment.