-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flush output stream #56
Comments
This is a bug with CoNLLRDF Formatter's RDF-Loader functionality. I'll have a look. |
Apparently the formatter doesn't split sentences on encountering a new block of prefixes, or when encountering an empty line, but only on lines beginning with a # symbol. This Problem may be related to #32. I think I've modified the Formatter to also split on empty lines while working on the CommonsCLI pull request, but I'm not entirely sure. Interestingly entering an empty line after four "sentences" are in the pipeline results in three of them getting processed by the formatter. I wonder what's up with that...
|
I have a similar behavior (using a fresh install). Below part of the original log. This is a CLI to enter a natural language sentence, then to parse it (that works), then to use CoNLL-RDF (extractor+updater+formatter) for some extraction task. "Sending" is what is sent to the CoNLL-RDF process (whitespaces normalized). The turtle is its output to stdout (I'm not reading it). Responses are produced after the third input is sent. Note that the response is empty in this case (but it's the same with full sentences that normally return RDF), so the response is basically the comment and the prefixes, but no triples. When I send Test3, I get Test1 results:
|
I tried to use the behavior above to improvise a workaround (just send multiple padding lines containing |
With the most recent update, behavior could have changed, but I just confirmed that However it seems like This could be a limitation of bash piping. It seems like the pipe instruction has a buffer that only gets flushed when it is filled sufficiently? Ofc this should not apply to a pipeline set up with CoNLLRDF Manager, but I think bash piping may be out of our hands. |
After discussing the issue, we figured out there are two major parts to it:
The cleanest way to change 1) might be to modify the streaming between classes to be a
I did not verify 3) yet. Will do that now. |
I investigated the behavior of a json-pipeline, and the buffering issue 2) appears to be absent. @chiarcos Please let me know if you run into problem 2) while using a json pipeline. As far as I can tell, your work-around with the injected comments should work there without fail. |
The answers to https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe were illuminating. It should be possible from inside java, to configure the StdIO buffer, but I haven't yet figured out how to do it. |
Using a fresh install,
Not quite:
;) I strongly suspect 1) to be the reason. In fact I remember that when writing the first version of the code, there was a design decision to aggregate |
Uh, interesting. I didn't catch that one beforehand. Hotfix incoming. Hopefully later this evening. |
I might have found it. CoNLL2RDF, line 180: change
to
If that is the source of the problem, the error arises because |
I committed the change, @leogott : please double-check that it works and close issue ;) |
Given a Manager-pipeline or line buffered bash-pipeline of StreamExtractor and CoNLLRDF Formatter:
StreamExtractor outputs the sentence, and Formatter receives it, waiting for a new comment or prefix to tell it the sentence is complete. (Behavior unchanged)
The second sentence is passed from the StreamExtractor to the Formatter, which causes the latter to output the first sentence and wait if there is more to the second sentence.
At this point the StreamExtractor terminates with sucess. The pipe to the Formatter is closed, causing it to output the second sentence and terminate with success. (Behavior unchanged)
The Components expecting a CoNLL-RDF Stream currently continue to hold on to the last sentence they received, delaying the output by one each. |
The core of your issue-report was the trickle-down delay, if I'm not mistaken?
|
Yes, but now this can be managed by sending a pseudo-sentence. That works for |
I've set up a workflow that reads natural language from stdin, produces a parse in a CoNLL format, then transforms that via CoNLLStreamExtractor(+CoNLLRDFUpdater)+CoNLLRDFFormatter and writes the result to stdout.
The problem is that CoNLL-RDF writes to stdout only after stdin is closed.
This happens in Bash pipelines, with or without CoNLLRDFUpdater, and also with CoNLLRDFManager.
For replication, run
and paste the following data in four steps
copy and paste a table (any table with at least two columns)
(enter empty line, in theory, this should lead to flushing into stdout)
copy and paste another table
(enter empty line)
close stdin, e.g., with
<CTRL>+D
At the moment, output is flushed only after step 5. Desired behavior is to flush twice (after 2 and 4).
Note that if this is confirmed, this is a major bug because it contradicts the entire idea of stream processing that CoNLL-RDF is designed for.
The text was updated successfully, but these errors were encountered: