-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gt-fraktur for twelve 19C books with pdf, alto, page, mets and metada…
…ta xml files.
- Loading branch information
0 parents
commit 0716831
Showing
734 changed files
with
390,977 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# -*- coding: utf-8 -*- | ||
############################################################################## | ||
#Author: @SVAKSHA, Tuesday 21 January 2014 14:00:20 PM IST | ||
############################################################################## | ||
|
||
|
||
#==================================================== | ||
# dont ignore PDFs | ||
#==================================================== | ||
|
||
|
||
#==================================================== | ||
# GENERAL | ||
#==================================================== | ||
.directory | ||
PRIVATE/* | ||
pvt/* | ||
IGNORE/* | ||
|
||
#---------------------------------------------------- | ||
# DATA folders | ||
#---------------------------------------------------- | ||
**/datum | ||
datum/ | ||
datum/** | ||
#---------------------------------------------------- | ||
# all zip/tar folders | ||
#---------------------------------------------------- | ||
*.zip | ||
*.tar.gz | ||
|
||
#==================================================== | ||
# DVCS | ||
#==================================================== | ||
*.git | ||
*.hg | ||
.hgignore | ||
|
||
#==================================================== | ||
# EDITOR: Vi | ||
#==================================================== | ||
*.swp | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# GT-FRAKTUR | ||
|
||
[gt-fraktur](https://github.com/ubtue/gt-fraktur/) is the Ground Truth (GT) data for Fraktur/Gothic prints from the 19th Century, released by UB, Uni-Tübingen as Open Data under the [CC0 public license](https://creativecommons.org/choose/zero/). | ||
|
||
|
||
+ [TOC](#toc) | ||
+ [GT-Data](#gt-data) | ||
+ [Quality Issues](#quality-issues) | ||
+ [LICENSE](#license) | ||
|
||
---- | ||
|
||
# TOC | ||
|
||
## GT Data | ||
|
||
This repository contains transcriptions of selected pages from 19th Century books as listed below. The original TIFF images used for OCR transcription of the following publications are published on Archive.org under the [CC0 public license](https://creativecommons.org/choose/zero/). | ||
|
||
### Shelfmark / DigitalID's of the 19th Century Fraktur prints selected for transcribing: | ||
|
||
|
||
| # | FolderName | NumberOfPages | URL-Shelfmark-DigitalID | Comments | | ||
| :-- | :--- | :-- | :--- | :--- | | ||
| 01. | [agtck_1834_02](#https://github.com/ubtue/gt-fraktur/tree/master/agtck_1834_02) | 15 pgs | http://idb.ub.uni-tuebingen.de/opendigi/agtck_1834_02 | | | ||
| 02. | [akzs_1860](#https://github.com/ubtue/gt-fraktur/tree/master/akzs_1860) | 24 pgs | http://idb.ub.uni-tuebingen.de/opendigi/akzs_1860 | | | ||
| 03. | [artl_001](#https://github.com/ubtue/gt-fraktur/tree/master/artl_001) | 20 pgs | http://idb.ub.uni-tuebingen.de/opendigi/artl_001 | | | ||
| 04. | [artl_002](#https://github.com/ubtue/gt-fraktur/tree/master/artl_002) | 18 pgs |http://idb.ub.uni-tuebingen.de/opendigi/artl_002 | Error in 1 image. | | ||
| 05. | __drey1834__ | 5 pgs | http://idb.ub.uni-tuebingen.de/opendigi/drey1834 | | | ||
| 06. | __harless1834__ | 7 pgs | http://idb.ub.uni-tuebingen.de/opendigi/harless1834 | | | ||
| 07. | [kath_1830_035](https://github.com/ubtue/gt-fraktur/tree/master/kath_1830_035) | 18 pgs | http://idb.ub.uni-tuebingen.de/opendigi/kath_1830_035 | | | ||
| 08. | __litrdsch_1875__ | 38 pgs | http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1875 | Errors in 2 images. | | ||
| 09. | __stml_1871_01__ | 22 pgs | http://idb.ub.uni-tuebingen.de/opendigi/stml_1871_01 | | | ||
| 10. | [thlblb_1866](https://github.com/ubtue/gt-fraktur/tree/master/thlblb_1866) | 25 pgs | http://idb.ub.uni-tuebingen.de/opendigi/thlblb_1866 | Errors in 3 images. | | ||
| 11. | [zpkt_1832_01](https://github.com/ubtue/gt-fraktur/tree/master/zpkt_1832_01) | 8 pgs | http://idb.ub.uni-tuebingen.de/opendigi/zpkt_1832_01 | | | ||
| 12. | __zpk_1838_01__ | 7 pgs | http://idb.ub.uni-tuebingen.de/opendigi/zpk_1838_01 | | | ||
|
||
|
||
### Quality Issues | ||
|
||
Details of the page quality issues observed during the transcription process: | ||
|
||
| # | Shelfmark-DigitalID | Quality Bugs | | ||
| :-- | :--- |:----- | | ||
| 1. | artl_002 | artl_002_00010.tif has bad alignment | | ||
| 2. | litrdsch_1875 | Misprint | | ||
| 3. | litrdsch_1875 | Misprint: `litrdsch_1875_0146.tif` (page 28); line 6-38 in the left column | | ||
| 4. | thlblb_1866 | Image "thlblb_1866_00037.tif", has a crossed 'o' (eg. ø, Unicode: U+00F8) in the word "Redaction" in multiple places on the page, which were manually corrected to a regular "o" during transcription. | | ||
| 5. | thlblb_1866 | `thlblb_1866_00121.tif`, right column - it seems like the long ſ was corrected manually | | ||
| 6. | thlblb_1866 | `thlblb_1866_00425.tif`, left column – the word "fünfte" is somehow blurred - seems like there are two "f". | | ||
|
||
|
||
---- | ||
|
||
# LICENSE | ||
|
||
* This data is is released by UB, Uni-Tuebingen as Open Data under the [CC0 public license](https://creativecommons.org/choose/zero/). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Token Tag Nested-Tag IsSpaceAfter/EndOfLine |
Binary file not shown.
Binary file not shown.
Oops, something went wrong.