unable to open .../WNdb/dict/index.adv #115

tommedema · 2013-12-23T19:16:09Z

When I perform the following a couple of million times I get the error unable to open .../WNdb/dict/index.adv:

function isWord(text, cb) {
    wordnet.lookup(text, function(results) {
        cb(Array.isArray(results) && results.length > 0);
    });
}

Is there anything I can do to resolve this?

The text was updated successfully, but these errors were encountered:

kkoch986 · 2014-03-31T18:41:37Z

@tommedema I'm having a hard time pinpointing this (mainly because these wordnet lookups are really slow). It did crash for me but i didn't find the same error.

In the meantime, is it essential to use wordnet for this? I hate to be pushing this all the time to people but the Trie is basically purpose-built for the 'isWord' test. I've built my trie using this code and you can do is something like this:

trie.contains(word);

To get a synchronous and lightning fast answer.

Here are some times (not rock solid benchmarks, but you get the idea) for comparison:

Using WordNet: crashed after about 4k, after a minute or two
Using Trie: 23,588,700 lookups ~39 seconds

Threw the code into a gist if you want to check it out https://gist.github.com/kkoch986/9899177

In the meantime, i'll tag this as a bug.

Thanks,
-Ken

tommedema · 2014-03-31T18:47:46Z

Thanks, I am no longer working on this issue.

kkoch986 · 2014-03-31T20:05:53Z

Ok no problem, just curious did you manage to resolve the error?

tommedema · 2014-03-31T20:06:53Z

I didn't :)

moos · 2014-05-03T13:22:13Z

As already mentioned, WordNet module is bare bones and notoriously underperforming except for simple lookups. You may want to look at https://github.com/moos/wordpos, built on top of natural's WordNet, with optimized perfermance using additional fast-index files and cached disk reads.

Although for simple isWord operations, I agree @kkoch986's suggestion might be better.

kkoch986 · 2014-05-05T13:29:00Z

Actually going to close this unless someone else runs into a similar problem. I think its safe to say the WNdb code directly is not best used this way, @moos I haven't had a chance to try the new wordpos module but it looks pretty cool thanks for the tip!

ahamid · 2014-10-14T15:50:26Z

I encounter the same error using the wordnet.lookup(word, cb) API.
If I wait a few seconds I get the same error for data.adv. Both index.adv and data.adv exist on disk at the reported location and are readable under the current user.

Edit: some more debugging: this appears to be a problem with too many open file handles:

{ [Error: EMFILE, open '/home/aaron/blah/blah/node_modules/WNdb/dict/index.adv']
  errno: 20,
  code: 'EMFILE',
  path: '/home/aaron/blah/blah/node_modules/WNdb/dict/index.adv' }

http://stackoverflow.com/questions/8965606/node-and-error-emfile-too-many-open-files

it looks like index_file.js and data_file.js may not be appropriately calling back the file close callback in their, um, callback...

Ouch, this is thorny: even if the file handles are in theory being (eventually) closed, because the API is async, we can have potentially unlimited pending calls, and therefore opened file handles. Given I'm looking up hundreds, possibly thousands, of synonyms, this is likely the case :(

moos · 2014-10-14T18:08:44Z

natural opens the index file on each lookup, and if you've got thousands of simultaneous lookups, that's how many open files you'll have. wordpos is optimized for multiple async reads and is much faster. You could combine wordpos' getPOS() or isX() method with its lookupX() for better performance than natural's lookup().

kkoch986 · 2014-10-15T13:34:17Z

Going to reopen this, it seems like the issue is around the files not being closed correctly, I'll try to dig in further and see if i can find anything.

ahamid · 2014-10-15T16:57:44Z

It turns out Bluebird promises library map function supports a concurrency flag that can limit pending promises, so I used that to work around this problem.

Promise.map words, ((word) => @lookupWordNetInfo word), concurrency: 10

kkoch986 · 2014-10-17T20:49:07Z

@ahamid so do you think the problem is too many concurrent calls? that could potentially explain why too many files are open at once.

ahamid · 2014-10-23T02:51:17Z

@kkoch986 Yeah, I'm pretty sure that's the case (well, I haven't proved the opposite - that files aren't eventually getting closed, but code looked fine on casual inspection). It's just the tradeoff of using an async-only api. I did not get around to using wordpos since the map trick did the job, which has become my go-to hammer for this sort of thing.

kkoch986 · 2014-10-23T13:59:23Z

Yea i think its worth a closer look, maybe an option to just load the thing into memory. no reason to keep reading it from files every time anyway, especially if your doing a large amount of lookups.

kkoch986 · 2015-01-22T01:07:50Z

So just to give everyone the latest news on this, I am kicking off a rewrite of the natural wordnet layer which should result in cleaner code and better performance. Hopefully in the next few weeks i'll have something to show for this and we can finally close this issue

moos · 2015-02-08T18:42:07Z

I think wordpos already solves this problem -- not only that its 'fastIndex' provides 30x performance boost over natural's WordNet methods. I'm happy to contribute any or all parts of wordpos's code to this effort, either as a rewrite, a sub-module, or drop-in plugin.
If you go the wholesale rewrite route, I'm afraid it'll break wordpos since it was built on top of the WordNet module's API.

kkoch986 · 2015-02-09T05:04:57Z

@moos see #211 and #170 the plan is to reimplement for performance/stability while maintaining the base API.

Theres a good chance we will build more functionality on top of the basic API but the main plan is to at least stabilize the code using the same API and move the wordnet downloading to an in-library corpus manager. Would love to have your input on this whole effort as well, I'm just getting into the actual wordnet files and coming up with a plan for indexing them more efficiently.

kkoch986 added Bugs and removed Bugs labels Mar 7, 2014

kkoch986 added the Bugs label Mar 31, 2014

kkoch986 closed this as completed May 5, 2014

kkoch986 reopened this Oct 15, 2014

kkoch986 mentioned this issue Jan 21, 2015

natural.WordNet.lookup: Unable to open file #208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to open .../WNdb/dict/index.adv #115

unable to open .../WNdb/dict/index.adv #115

tommedema commented Dec 23, 2013

kkoch986 commented Mar 31, 2014

tommedema commented Mar 31, 2014

kkoch986 commented Mar 31, 2014

tommedema commented Mar 31, 2014

moos commented May 3, 2014

kkoch986 commented May 5, 2014

ahamid commented Oct 14, 2014

moos commented Oct 14, 2014

kkoch986 commented Oct 15, 2014

ahamid commented Oct 15, 2014

kkoch986 commented Oct 17, 2014

ahamid commented Oct 23, 2014

kkoch986 commented Oct 23, 2014

kkoch986 commented Jan 22, 2015

moos commented Feb 8, 2015

kkoch986 commented Feb 9, 2015

unable to open .../WNdb/dict/index.adv #115

unable to open .../WNdb/dict/index.adv #115

Comments

tommedema commented Dec 23, 2013

kkoch986 commented Mar 31, 2014

tommedema commented Mar 31, 2014

kkoch986 commented Mar 31, 2014

tommedema commented Mar 31, 2014

moos commented May 3, 2014

kkoch986 commented May 5, 2014

ahamid commented Oct 14, 2014

moos commented Oct 14, 2014

kkoch986 commented Oct 15, 2014

ahamid commented Oct 15, 2014

kkoch986 commented Oct 17, 2014

ahamid commented Oct 23, 2014

kkoch986 commented Oct 23, 2014

kkoch986 commented Jan 22, 2015

moos commented Feb 8, 2015

kkoch986 commented Feb 9, 2015