Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for non-sequential TSV data (for Fintan) #26

Open
chiarcos opened this issue Mar 30, 2020 · 0 comments
Open

support for non-sequential TSV data (for Fintan) #26

chiarcos opened this issue Mar 30, 2020 · 0 comments
Assignees

Comments

@chiarcos
Copy link
Contributor

chiarcos commented Mar 30, 2020

When processing CoNLL, we need to keep track of sequence (nif:nextWord, nif:nextSentence) and structure (nif:Word, nif:Sentence, conll:HEAD).
When processing one-word-per-line dictionary formats (say, Unimorph, TIAD-TSV or OMW-TSV), sequence and groups don't matter anymore.

Request:
Create TSVStreamExtractor as a copy of CoNLLStreamExtractor that
(a) does not produce NIF properties and classes, that
(b) replaces all conll: properties with properties in a Fintan namespace
(c) processes line by line or a fixed (conigurable) number of lines
(d) separates the RDF output from each (sequence of input) line(s) with one empty line

Define CoNLLStreamExtractor as a subclass of the TSVStreamExtractor.
Overrides Fintan behavior (see above) with current behavior.

Create FintanUpdater as a copy of CoNLLRDFUpdater, define CoNLLRDFUpdater as a subclass of FintanUpdater

  • should support SPARQL SELECT statements to export TSV directly (without CoNLLRDFFormatter)
  • keep CoNLLRDFUpdater class for backward compatiblity, no difference in functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants