-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to open .../WNdb/dict/index.adv #115
Comments
@tommedema I'm having a hard time pinpointing this (mainly because these wordnet lookups are really slow). It did crash for me but i didn't find the same error. In the meantime, is it essential to use wordnet for this? I hate to be pushing this all the time to people but the Trie is basically purpose-built for the 'isWord' test. I've built my trie using this code and you can do is something like this:
To get a synchronous and lightning fast answer. Here are some times (not rock solid benchmarks, but you get the idea) for comparison: Using WordNet: crashed after about 4k, after a minute or two Threw the code into a gist if you want to check it out https://gist.github.com/kkoch986/9899177 In the meantime, i'll tag this as a bug. Thanks, |
Thanks, I am no longer working on this issue. |
Ok no problem, just curious did you manage to resolve the error? |
I didn't :) |
As already mentioned, WordNet module is bare bones and notoriously underperforming except for simple lookups. You may want to look at https://github.com/moos/wordpos, built on top of natural's WordNet, with optimized perfermance using additional fast-index files and cached disk reads. Although for simple isWord operations, I agree @kkoch986's suggestion might be better. |
Actually going to close this unless someone else runs into a similar problem. I think its safe to say the WNdb code directly is not best used this way, @moos I haven't had a chance to try the new wordpos module but it looks pretty cool thanks for the tip! |
I encounter the same error using the Edit: some more debugging: this appears to be a problem with too many open file handles:
http://stackoverflow.com/questions/8965606/node-and-error-emfile-too-many-open-files it looks like Ouch, this is thorny: even if the file handles are in theory being (eventually) closed, because the API is async, we can have potentially unlimited pending calls, and therefore opened file handles. Given I'm looking up hundreds, possibly thousands, of synonyms, this is likely the case :( |
natural opens the index file on each lookup, and if you've got thousands of simultaneous lookups, that's how many open files you'll have. wordpos is optimized for multiple async reads and is much faster. You could combine wordpos' |
Going to reopen this, it seems like the issue is around the files not being closed correctly, I'll try to dig in further and see if i can find anything. |
It turns out Bluebird promises library Promise.map words, ((word) => @lookupWordNetInfo word), concurrency: 10 |
@ahamid so do you think the problem is too many concurrent calls? that could potentially explain why too many files are open at once. |
@kkoch986 Yeah, I'm pretty sure that's the case (well, I haven't proved the opposite - that files aren't eventually getting closed, but code looked fine on casual inspection). It's just the tradeoff of using an async-only api. I did not get around to using |
Yea i think its worth a closer look, maybe an option to just load the thing into memory. no reason to keep reading it from files every time anyway, especially if your doing a large amount of lookups. |
So just to give everyone the latest news on this, I am kicking off a rewrite of the natural wordnet layer which should result in cleaner code and better performance. Hopefully in the next few weeks i'll have something to show for this and we can finally close this issue |
I think wordpos already solves this problem -- not only that its 'fastIndex' provides 30x performance boost over natural's WordNet methods. I'm happy to contribute any or all parts of wordpos's code to this effort, either as a rewrite, a sub-module, or drop-in plugin. |
@moos see #211 and #170 the plan is to reimplement for performance/stability while maintaining the base API. Theres a good chance we will build more functionality on top of the basic API but the main plan is to at least stabilize the code using the same API and move the wordnet downloading to an in-library corpus manager. Would love to have your input on this whole effort as well, I'm just getting into the actual wordnet files and coming up with a plan for indexing them more efficiently. |
When I perform the following a couple of million times I get the error
unable to open .../WNdb/dict/index.adv
:Is there anything I can do to resolve this?
The text was updated successfully, but these errors were encountered: