Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interested in CYK and Earley chart parsers? #183

Open
Hugo-ter-Doest opened this issue Aug 22, 2014 · 25 comments
Open

Interested in CYK and Earley chart parsers? #183

Hugo-ter-Doest opened this issue Aug 22, 2014 · 25 comments
Assignees

Comments

@Hugo-ter-Doest
Copy link
Collaborator

I implemented CYK and Earley chart parsers in Node. Interested in including it in the natural package? Please let me know.

Regards,
Hugo

@kkoch986
Copy link
Member

Hugo,

I've been really interested in parsers lately (working on something like an LR(1) parser myself right now), I'd love to check them out and we can talk about the best way to integrate them into natural.

-Ken

@Hugo-ter-Doest
Copy link
Collaborator Author

Hi Ken,

I created a repository with the chart parsers:
https://github.com/Hugo-ter-Doest/chart_parsers
The parsers are in /routes. I built a web app around it for testing and demonstration purposes. I think we/I can strip the parsers down to the algorithms and integrate it into natural. Also test code and documentation needs to be added.

Regards, Hugo

@kkoch986
Copy link
Member

kkoch986 commented Sep 3, 2014

Hugo,

Sorry for the delay on this, it looks pretty solid. I think if you could isolate the parsers and stand some tests up around them we could definitely merge them in.

-Ken

@Hugo-ter-Doest
Copy link
Collaborator Author

Here is a sign of life...
I did a lot of work on the parsers:

  • Parsers are now working with the same Chart, Item and Grammar objects.
  • The data structure of items follows the format for InfoVis so that parse trees can be visualised easily.
  • Grammar object is based on a PEG using http://pegjs.majda.cz/. It reads unification grammars as well. That's for later when I add feature structures and unification.
  • Added Left Corner parser.
  • Earley and Left Corner inherit from a generic Chart Parser.
  • Turned the recognisers in parsers: they record the parse now by keeping track of the children used to recognise an item (a rule).

Right now I'm working on tests using the Jasmin framework. At first I started writing tests using assert, but I saw that natural uses Jasmin.

Maybe we should look at the interfacing between existing modules in the natural module and the parsers. At the moment the parsers accept an a tagged sentence of the form:
[['I', 'NP'],
['saw', 'V'],
['the', 'DET'],
['man', 'N'],
['with', 'P'],
['the', 'DET'],
['telescope', 'N']]

Best regards,
Hugo

@kkoch986
Copy link
Member

Hugo,

Looks really exciting! I'll take a closer look tonight if i can. To be honest i'll have to quickly look through how our parsers work but it sounds like the tagged sentence makes sense.

-Ken

@kkoch986
Copy link
Member

Hugo,

Sorry for the delay on this, its still on my list just been traveling for work the past week so i've been a little distracted. Going to pull everything down now, look at our parsers and see what makes sense.

-Ken

@Hugo-ter-Doest
Copy link
Collaborator Author

Tests are in the spec folder. ChartParser_spec will fail because I'm using it to test the Head-Corner parser which is still under development. If you switch these lines:
//[LeftCornerParser, EarleyParser].forEach(function(ChartParser) {
[HeadCornerParser].forEach(function(ChartParser) {

the Left-Corner and Earley parsers will be tested.

Regards,
Hugo

@kkoch986
Copy link
Member

@Hugo-ter-Doest, looking over some of the stuff today. I'm trying to think of what the best fit with our existing code will be.

It seems to me if the parser requires the sentence to be tagged, for now one would have to use the wordnet module since we don't have a working POS tagger currently, it would be cool to have an example of that in the docs.

Something seems wrong in the unit tests, i tried changing a few values in an effort to break them and they didn't break. I'll work on them a bit and see what I can come up with.

@Hugo-ter-Doest
Copy link
Collaborator Author

I will take a look at the Wordnet module to see how I can connect it to the parsers.

Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.

Regards,
Hugo

@kkoch986
Copy link
Member

Ok i'll take a look, i was just running through it quickly maybe i missed
something. Also, re: wordnet, I dont think it needs to be integrated with
the parsers but maybe an example of taking a string tokenizing it, looking
it up in wordnet and producing a list of tagged tokens to pass to the
parser. I think it'll be a useful example for people who want to use the
parsers in the future.

Thanks!

On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest [email protected]
wrote:

I will take a look at the Wordnet module to see how I can connect it to
the parsers.

Regarding the unit tests I cannot see what is going on. In my environment
they fail if I change expected values in the spec files. For instance, if
you exchange the indices of parse_trees in lines 71/72 of
ChartParser_spec.js, the test will/should fail. If you let me know what
happens I will look into this.

Regards,
Hugo


Reply to this email directly or view it on GitHub
#183 (comment).

@Hugo-ter-Doest
Copy link
Collaborator Author

Regarding Wordnet, I will create an example.

Unit tests: it's good idea to pull the latest code because I'm working on
it on a daily basis.

Hugo

2014-10-23 22:09 GMT+02:00 Ken Koch [email protected]:

Ok i'll take a look, i was just running through it quickly maybe i missed
something. Also, re: wordnet, I dont think it needs to be integrated with
the parsers but maybe an example of taking a string tokenizing it, looking
it up in wordnet and producing a list of tagged tokens to pass to the
parser. I think it'll be a useful example for people who want to use the
parsers in the future.

Thanks!

On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest [email protected]

wrote:

I will take a look at the Wordnet module to see how I can connect it to
the parsers.

Regarding the unit tests I cannot see what is going on. In my
environment
they fail if I change expected values in the spec files. For instance,
if
you exchange the indices of parse_trees in lines 71/72 of
ChartParser_spec.js, the test will/should fail. If you let me know what
happens I will look into this.

Regards,
Hugo


Reply to this email directly or view it on GitHub
#183 (comment).


Reply to this email directly or view it on GitHub
#183 (comment).

Met vriendelijke groeten,
Hugo ter Doest
06-23 43 60 33

@kkoch986
Copy link
Member

Yea i think it was my mistake on the unit tests sorry about that was just playing with them and everything seems fine.

Would you be able to outline for me just the files we would need to include the parsers in natural as well as what new external dependencies we would have to add? That way i know exactly what I need to do to integrate them.

Thanks!

@Hugo-ter-Doest
Copy link
Collaborator Author

External dependencies are:
"lodash" : "",
"fs" : "
",
"log4js": "",
"typeof": "
"

plus jasmine-node for testing.

Files you need are:
from lib:
Agenda.js
ChartParser.js
CYKParser.js
EarleyItem.js
GrammarParser.js
LeftCornerParser.js
Chart.js
CYK_Item.js
DoubleDottedItem.js
EarleyParser.js
GoalItem.js
HeadCornerParser.js
PEG-grammar-for-unification-grammar.txt

And from spec:
ChartParser_spec.js
Chart_spec.js
CYKParser_spec.js
EarleyItem_spec.js
GrammarParser_spec.js
HeadCornerParser_spec.js

Hugo

@Hugo-ter-Doest
Copy link
Collaborator Author

I forgot to mention the data files for the unit tests. From data you need:
math_expressions.txt
test_grammar_for_CYK.txt
minimal_grammar.txt
test_grammar_for_CFG.txt

@Hugo-ter-Doest
Copy link
Collaborator Author

Did some work on the example with Wordnet. I found out that Wordnet supports the following POS tags:
n NOUN
v VERB
a ADJECTIVE
s ADJECTIVE SATELLITE
r ADVERB

which is quite limited for full parsing. I will think of an example that makes some sense. In the mean time you can check the example. It is in the example folder (where else :-) of chart_parsers.

Also, I made the parsers work with a tagged sentence that may have multiple tags per token, like this:
[["I","s","n"],["saw","v","n"],["the","unknown"],["man","v","n"],["with","unknown"],["the","unknown"],["telescope","v","n"]]

Hugo

@kkoch986
Copy link
Member

Ok cool, was just looking for some end to end kind of example but i agree wordnet may not be the best. I think a better method for POS tagging is definitely a high priority for me as soon as i get some time to work on it.

@Hugo-ter-Doest
Copy link
Collaborator Author

I did something else to complement the Wordnet tags: I wrote a module FunctionWordTagger that reads a bunch of files with function words and tags the rest of the sentence. The module is in the lib folder, the dictionary files are in the data folder.

Hugo

@kkoch986
Copy link
Member

Hugo,

Sorry again for the slowness on this been a crazy few weeks at work, can you just let me know if there are any lodash features you use that arent included in underscore? I don't really want to have both dependencies since theyre pretty interchangable.

I think eventually we could port natural over to lodash but for now i just want to find the quickest path to getting the parsers integrated.

I just created a branch for this so its at the top of my list hopefully by the end of the weekend i'll have made some serious progress.

-Ken

@Hugo-ter-Doest
Copy link
Collaborator Author

I will take a look at the underscore features. If I remember right, I use lodash for deep comparison of objects only. I checked and this is supported by underscore as well. So it's probably no problem to exchange them.

Hugo

@Hugo-ter-Doest
Copy link
Collaborator Author

Oh and there's no hurry. I'm writing this stuff just for fun and to learn a new language and libraries.

@Hugo-ter-Doest
Copy link
Collaborator Author

I replaced lodash with underscore!

Hugo

@kkoch986
Copy link
Member

Perfect! i was playing around with it and replaced it in a few places as well so i had a feeling it would have worked.

@silentrob
Copy link
Member

This looks amazing. We are still missing a CCG or minimalist grammar. But that is a great list.

@Hugo-ter-Doest
Copy link
Collaborator Author

Thanks! Plan is to integrate this into the natural module.

Regards,
Hugo

@Hugo-ter-Doest
Copy link
Collaborator Author

I published the chart parsers on npm:
npm install chart-parsers

Regards,
Hugo

@Hugo-ter-Doest Hugo-ter-Doest self-assigned this Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants