Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a Python wrapper #57

Open
chiarcos opened this issue Jun 25, 2021 · 3 comments
Open

Provide a Python wrapper #57

chiarcos opened this issue Jun 25, 2021 · 3 comments
Assignees

Comments

@chiarcos
Copy link
Contributor

  • Start CoNLLRDFManager with a given configuration with subprocess.Popen, e.g.,

      self.parse2graph_call=str(\
      	"bash "+os.path.join(self.path, "run.sh")+" "+\
      	"CoNLLRDFManager -c "+os.path.join(self.path,"..","parse2graph.json"))
      self.parse2graph=subprocess.Popen(self.parse2graph_call.split(),
      		 stdin=subprocess.PIPE,
               stdout=subprocess.PIPE,
               stderr=sys.stderr, 
               universal_newlines=True,
               bufsize=0)
    
  • Provide a parse(self, string) method that

    • writes into self.parse2graph.stdin,
    • terminates with a stop symbol that will be preserved by CoNLL-RDF (e.g. \n#_END_\n\n),
    • reads from self.parse2graph.stdout until the stop symbol is encountered
    • return the result

This is implemented already, but it currently fails because of #56.

@chiarcos chiarcos self-assigned this Jun 25, 2021
@chiarcos
Copy link
Contributor Author

chiarcos commented Jul 2, 2021

This is operational now. Basically, this only requires a single file. Any preferences where to put this in the overall repository structure? Otherwise, I would put this in examples, but it won't be found there.

@leogott
Copy link
Contributor

leogott commented Jul 3, 2021

src/main/python/conll_rdf.py seems like a reasonable choice?

@chiarcos
Copy link
Contributor Author

chiarcos commented Jul 6, 2021

src/main/python/extract.py (no formatter or updater included yet). Tested only for input from stdin without SPARQL execution.
TODO (low prio):

  • test for SPARQL exec
  • test for processing lists of files
  • test for processing single file
  • test for processing multi-line strings (= full content of one file)
  • test for processing bulk data (tbc whether we may need to restart the extractor process)
  • tested on Ubuntu only => test on Fedora-type Linux and MacOS

Output is slightly reformatted (dedup prefixes, one line per word), but not restructured (i.e., no line ordering as provided by CoNLLRDFFormatter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants