-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interested in CYK and Earley chart parsers? #183
Comments
Hugo, I've been really interested in parsers lately (working on something like an LR(1) parser myself right now), I'd love to check them out and we can talk about the best way to integrate them into natural. -Ken |
Hi Ken, I created a repository with the chart parsers: Regards, Hugo |
Hugo, Sorry for the delay on this, it looks pretty solid. I think if you could isolate the parsers and stand some tests up around them we could definitely merge them in. -Ken |
Here is a sign of life...
Right now I'm working on tests using the Jasmin framework. At first I started writing tests using assert, but I saw that natural uses Jasmin. Maybe we should look at the interfacing between existing modules in the natural module and the parsers. At the moment the parsers accept an a tagged sentence of the form: Best regards, |
Hugo, Looks really exciting! I'll take a closer look tonight if i can. To be honest i'll have to quickly look through how our parsers work but it sounds like the tagged sentence makes sense. -Ken |
Hugo, Sorry for the delay on this, its still on my list just been traveling for work the past week so i've been a little distracted. Going to pull everything down now, look at our parsers and see what makes sense. -Ken |
Tests are in the spec folder. ChartParser_spec will fail because I'm using it to test the Head-Corner parser which is still under development. If you switch these lines: the Left-Corner and Earley parsers will be tested. Regards, |
@Hugo-ter-Doest, looking over some of the stuff today. I'm trying to think of what the best fit with our existing code will be. It seems to me if the parser requires the sentence to be tagged, for now one would have to use the wordnet module since we don't have a working POS tagger currently, it would be cool to have an example of that in the docs. Something seems wrong in the unit tests, i tried changing a few values in an effort to break them and they didn't break. I'll work on them a bit and see what I can come up with. |
I will take a look at the Wordnet module to see how I can connect it to the parsers. Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this. Regards, |
Ok i'll take a look, i was just running through it quickly maybe i missed Thanks! On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest [email protected]
|
Regarding Wordnet, I will create an example. Unit tests: it's good idea to pull the latest code because I'm working on Hugo 2014-10-23 22:09 GMT+02:00 Ken Koch [email protected]:
Met vriendelijke groeten, |
Yea i think it was my mistake on the unit tests sorry about that was just playing with them and everything seems fine. Would you be able to outline for me just the files we would need to include the parsers in natural as well as what new external dependencies we would have to add? That way i know exactly what I need to do to integrate them. Thanks! |
External dependencies are: plus jasmine-node for testing. Files you need are: And from spec: Hugo |
I forgot to mention the data files for the unit tests. From data you need: |
Did some work on the example with Wordnet. I found out that Wordnet supports the following POS tags: which is quite limited for full parsing. I will think of an example that makes some sense. In the mean time you can check the example. It is in the example folder (where else :-) of chart_parsers. Also, I made the parsers work with a tagged sentence that may have multiple tags per token, like this: Hugo |
Ok cool, was just looking for some end to end kind of example but i agree wordnet may not be the best. I think a better method for POS tagging is definitely a high priority for me as soon as i get some time to work on it. |
I did something else to complement the Wordnet tags: I wrote a module FunctionWordTagger that reads a bunch of files with function words and tags the rest of the sentence. The module is in the lib folder, the dictionary files are in the data folder. Hugo |
Hugo, Sorry again for the slowness on this been a crazy few weeks at work, can you just let me know if there are any lodash features you use that arent included in underscore? I don't really want to have both dependencies since theyre pretty interchangable. I think eventually we could port natural over to lodash but for now i just want to find the quickest path to getting the parsers integrated. I just created a branch for this so its at the top of my list hopefully by the end of the weekend i'll have made some serious progress. -Ken |
I will take a look at the underscore features. If I remember right, I use lodash for deep comparison of objects only. I checked and this is supported by underscore as well. So it's probably no problem to exchange them. Hugo |
Oh and there's no hurry. I'm writing this stuff just for fun and to learn a new language and libraries. |
I replaced lodash with underscore! Hugo |
Perfect! i was playing around with it and replaced it in a few places as well so i had a feeling it would have worked. |
This looks amazing. We are still missing a CCG or minimalist grammar. But that is a great list. |
Thanks! Plan is to integrate this into the natural module. Regards, |
I published the chart parsers on npm: Regards, |
I implemented CYK and Earley chart parsers in Node. Interested in including it in the natural package? Please let me know.
Regards,
Hugo
The text was updated successfully, but these errors were encountered: