Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about complexity analysis #9

Open
2g-XzenG opened this issue Sep 14, 2017 · 5 comments
Open

Questions about complexity analysis #9

2g-XzenG opened this issue Sep 14, 2017 · 5 comments

Comments

@2g-XzenG
Copy link

Hi Ed,

As mentioned in your paper, "Therefore the complexity of Med2Vec is dominated by the code representation learning process, for which we use the Skip-gram algorithm".

I know you use grouper/parent codes to decrease the complexity of visit-level learning process. But it seems that you didn't do much on the code-level part.

Is there a reason why you do not use methods like negative sampling to decrease the complexity of code level learning process?

Thanks
Xianlong

@mp2893
Copy link
Owner

mp2893 commented Sep 15, 2017

That's good question.
I actually thought about using negative sampling or some other trick (e.g. hierarchical softmax).
But the number of unique codes in the dataset was 30K40K, which is significantly smaller than vocabulary sizes you typically see in NLP applications, which are between 100K1M.
So I decided that negative sampling was not the most important thing.
Plus, when I did a preliminary implementation of negative sampling in Theano, I did not see significant speed-up. The main reason was that the sampling process took a long time.
But that was a long time ago. So these days, maybe Theano's random sampling mechanism could be faster than couple years ago.
You are welcome to try and report it back here.

Thanks,
Ed

@2g-XzenG
Copy link
Author

Thanks for the respond.
I re-written your code using Tensorflow, and under Tensorflow negative sampling increase the running time form 5 hours to 1.5 hours.
Experiment set-up:

  1. for the visit level, I did not use any grouper.
  2. batch_size = 128, embedding = 200.
  3. on one GPU:P100.

@mp2893
Copy link
Owner

mp2893 commented Sep 19, 2017

Cool!
And there wasn't any noticeable performance drop due to negative sampling?

@2g-XzenG
Copy link
Author

Actually, I think the answer might be yes. Is this something I should expect? I mean, will negative sampling decrease the performance in general cases?

For evaluation, I used 91 sets of synonyms of ICD codes (like [[278.0, 278.00, 278.01], [391.0, 391.1, ...]]).
On your trained embedding, the average similarity is 0.73.
On my Tensorflow original version (no demographic information, no grouper on visit-level) embedding, the average similarity is 0.58.
On negative Tensorflow version embedding, I got 0.29 - 0.48 depending on different negative sampling size (I tested size = [5,10,32,64], the average value decrease when the size increase, it reach max when size = 5).

@mp2893
Copy link
Owner

mp2893 commented Sep 27, 2017

It's strange that smaller sampling size will lead to lower performance.
But generally, negative sampling will of course decrease the performance in actual practice because your model is not exposed to the entire label space all the time.
Although, I've never played around with negative sampling extensively, so it's hard for me to give you any more tips on how to maintain the performance while using negative sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants