-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Killed process #1
Comments
qvec was designed to load the whole embedding file into memory, because it makes it easier to calculate column-wise correlations. If you want to use this implementation as-is you would need a machine with enough RAM to hold the whole dataset. I am working now on an improved version of qvec, that uses CCA-algorithm instead of sum of correlations. See qvec_cca.py, this implementation still loads everything into memory, but does not have to. It can be modified to process data on the fly. However, it requires Matlab installed to perform the actual CCA calculation. Please see if it works better for you. |
The memory is actually enough. The machine has 51 G free memory. It shouldn't be the memory issue. I suspected that and that's why I ran it on a big machine. |
sorry, I didn't notice your wrote in the first message that you have 51G free. However, I still think this is a memory issue because "Killed" error message is not from qvec but from your OS. Even though you tired bigger embeddings, this does not necessarily imply that the bigger file requires more memory: the data is stored in a python dictionary data structure, so if there are repeated lines or more spaces in the bigger file it might still need less memory in the python dictionary. I suggest you try in a tmux session to run qvec in one pane and in another monitor memory usage with htop command. |
Well, it's surprising still. The memory depends on the number of rows and number of columns otherwise everything else should be the same. |
Running it using gensim resolves the problem though! |
Great, thanks for an update :) On Mon, Jan 11, 2016 at 4:11 PM, nick-magnini [email protected]
|
As a suggestion, it is great to make your code compatible with gensim since gensim has been widely used. |
@nick-magnini Thanks! It is on our gensim student project list |
Hi,
Thanks for making the code available. I have an embedding model in this format:
word1 4 -2 3 1 1 1 0 -2 2 3 1 0 0 0 -3 -4 0 0 3 -4 1 -5 2 -2 0 -1 -2 0 0 1 0 0 2 2 0 3 -4 -2 0 -5 -1 1 1 2 -2 0 -2 0 -2 -3 -1 -3 0 0 -5 0 5 -2 -1 -2 0 2 0 0 0 2 5 -3 1 2 1 -3 0 1 3 0 -3 0 1 -2 2 -1 -1 0 -4 2 0 -1 0 0 -1 1 0 -5 2 0 0 0 -2 -2
word2 ...
It contains 10008676 lines and about 2.5 GB in size. I use python2.7. my running command is this:
$> ./qvec-python2.7.py --in_vectors $embedding --in_oracle oracles/semcor_noun_verb.supersenses.en
After running "Loading VSM file: ....", it takes around 10-20 mins till it stops. The output after being stoped is "Killed". It can't be memory since I tired bigger embeddings and they went through. What could it be the possible reason?
The text was updated successfully, but these errors were encountered: