Questions about complexity analysis #9

2g-XzenG · 2017-09-14T20:50:05Z

Hi Ed,

As mentioned in your paper, "Therefore the complexity of Med2Vec is dominated by the code representation learning process, for which we use the Skip-gram algorithm".

I know you use grouper/parent codes to decrease the complexity of visit-level learning process. But it seems that you didn't do much on the code-level part.

Is there a reason why you do not use methods like negative sampling to decrease the complexity of code level learning process?

Thanks
Xianlong

mp2893 · 2017-09-15T00:13:45Z

That's good question.
I actually thought about using negative sampling or some other trick (e.g. hierarchical softmax).
But the number of unique codes in the dataset was 30K~~40K, which is significantly smaller than vocabulary sizes you typically see in NLP applications, which are between 100K~~1M.
So I decided that negative sampling was not the most important thing.
Plus, when I did a preliminary implementation of negative sampling in Theano, I did not see significant speed-up. The main reason was that the sampling process took a long time.
But that was a long time ago. So these days, maybe Theano's random sampling mechanism could be faster than couple years ago.
You are welcome to try and report it back here.

Thanks,
Ed

2g-XzenG · 2017-09-18T13:32:36Z

Thanks for the respond.
I re-written your code using Tensorflow, and under Tensorflow negative sampling increase the running time form 5 hours to 1.5 hours.
Experiment set-up:

for the visit level, I did not use any grouper.
batch_size = 128, embedding = 200.
on one GPU:P100.

mp2893 · 2017-09-19T03:24:30Z

Cool!
And there wasn't any noticeable performance drop due to negative sampling?

2g-XzenG · 2017-09-19T04:55:19Z

Actually, I think the answer might be yes. Is this something I should expect? I mean, will negative sampling decrease the performance in general cases?

For evaluation, I used 91 sets of synonyms of ICD codes (like [[278.0, 278.00, 278.01], [391.0, 391.1, ...]]).
On your trained embedding, the average similarity is 0.73.
On my Tensorflow original version (no demographic information, no grouper on visit-level) embedding, the average similarity is 0.58.
On negative Tensorflow version embedding, I got 0.29 - 0.48 depending on different negative sampling size (I tested size = [5,10,32,64], the average value decrease when the size increase, it reach max when size = 5).

mp2893 · 2017-09-27T03:33:06Z

It's strange that smaller sampling size will lead to lower performance.
But generally, negative sampling will of course decrease the performance in actual practice because your model is not exposed to the entire label space all the time.
Although, I've never played around with negative sampling extensively, so it's hard for me to give you any more tips on how to maintain the performance while using negative sampling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about complexity analysis #9

Questions about complexity analysis #9

2g-XzenG commented Sep 14, 2017

mp2893 commented Sep 15, 2017

2g-XzenG commented Sep 18, 2017

mp2893 commented Sep 19, 2017

2g-XzenG commented Sep 19, 2017

mp2893 commented Sep 27, 2017

Questions about complexity analysis #9

Questions about complexity analysis #9

Comments

2g-XzenG commented Sep 14, 2017

mp2893 commented Sep 15, 2017

2g-XzenG commented Sep 18, 2017

mp2893 commented Sep 19, 2017

2g-XzenG commented Sep 19, 2017

mp2893 commented Sep 27, 2017