Killed process #1

nick-magnini · 2016-01-06T23:30:59Z

Hi,

Thanks for making the code available. I have an embedding model in this format:

word1 4 -2 3 1 1 1 0 -2 2 3 1 0 0 0 -3 -4 0 0 3 -4 1 -5 2 -2 0 -1 -2 0 0 1 0 0 2 2 0 3 -4 -2 0 -5 -1 1 1 2 -2 0 -2 0 -2 -3 -1 -3 0 0 -5 0 5 -2 -1 -2 0 2 0 0 0 2 5 -3 1 2 1 -3 0 1 3 0 -3 0 1 -2 2 -1 -1 0 -4 2 0 -1 0 0 -1 1 0 -5 2 0 0 0 -2 -2
word2 ...

It contains 10008676 lines and about 2.5 GB in size. I use python2.7. my running command is this:
$> ./qvec-python2.7.py --in_vectors $embedding --in_oracle oracles/semcor_noun_verb.supersenses.en

After running "Loading VSM file: ....", it takes around 10-20 mins till it stops. The output after being stoped is "Killed". It can't be memory since I tired bigger embeddings and they went through. What could it be the possible reason?

ytsvetko · 2016-01-07T00:01:33Z

qvec was designed to load the whole embedding file into memory, because it makes it easier to calculate column-wise correlations. If you want to use this implementation as-is you would need a machine with enough RAM to hold the whole dataset.

I am working now on an improved version of qvec, that uses CCA-algorithm instead of sum of correlations. See qvec_cca.py, this implementation still loads everything into memory, but does not have to. It can be modified to process data on the fly. However, it requires Matlab installed to perform the actual CCA calculation. Please see if it works better for you.

nick-magnini · 2016-01-07T00:16:05Z

The memory is actually enough.
$> free -g
total used free shared buffers cached
Mem: 93 42 51 0 0 6

The machine has 51 G free memory. It shouldn't be the memory issue. I suspected that and that's why I ran it on a big machine.

ytsvetko · 2016-01-07T00:36:30Z

sorry, I didn't notice your wrote in the first message that you have 51G free. However, I still think this is a memory issue because "Killed" error message is not from qvec but from your OS. Even though you tired bigger embeddings, this does not necessarily imply that the bigger file requires more memory: the data is stored in a python dictionary data structure, so if there are repeated lines or more spaces in the bigger file it might still need less memory in the python dictionary. I suggest you try in a tmux session to run qvec in one pane and in another monitor memory usage with htop command.

nick-magnini · 2016-01-07T00:59:13Z

Well, it's surprising still. The memory depends on the number of rows and number of columns otherwise everything else should be the same.
Having an embedding file with 100 dims and 10008676 unique words should take much more memory than a file with the same 10008676 unique words and 15 dims for each. Isn't it true?

nick-magnini · 2016-01-11T21:11:12Z

Running it using gensim resolves the problem though!

ytsvetko · 2016-01-11T21:42:12Z

Great, thanks for an update :)

On Mon, Jan 11, 2016 at 4:11 PM, nick-magnini [email protected]
wrote:

Running it using gensim resolves the problem though!

—
Reply to this email directly or view it on GitHub
#1 (comment).

nick-magnini · 2016-01-11T21:44:54Z

As a suggestion, it is great to make your code compatible with gensim since gensim has been widely used.

tmylk · 2016-08-12T13:07:23Z

@nick-magnini Thanks! It is on our gensim student project list
https://github.com/RaRe-Technologies/gensim/wiki/Student-Projects#intrinisic-evaluation-of-word2vec-models-with-qvec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Killed process #1

Killed process #1

nick-magnini commented Jan 6, 2016

ytsvetko commented Jan 7, 2016

nick-magnini commented Jan 7, 2016

ytsvetko commented Jan 7, 2016

nick-magnini commented Jan 7, 2016

nick-magnini commented Jan 11, 2016

ytsvetko commented Jan 11, 2016

nick-magnini commented Jan 11, 2016

tmylk commented Aug 12, 2016

Killed process #1

Killed process #1

Comments

nick-magnini commented Jan 6, 2016

ytsvetko commented Jan 7, 2016

nick-magnini commented Jan 7, 2016

ytsvetko commented Jan 7, 2016

nick-magnini commented Jan 7, 2016

nick-magnini commented Jan 11, 2016

ytsvetko commented Jan 11, 2016

nick-magnini commented Jan 11, 2016

tmylk commented Aug 12, 2016