Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epochs and loss during training #20

Open
wywhu opened this issue May 7, 2020 · 3 comments
Open

Epochs and loss during training #20

wywhu opened this issue May 7, 2020 · 3 comments

Comments

@wywhu
Copy link

wywhu commented May 7, 2020

Hi Ed,

I am training embedding using your default hyperparameters, except window_size. The minimum number of visits in my dataset is 2, but I set window_size=3 as I suppose your code can handle the inconsistency between window_size and actual sequence length. Am I right?

I also noticed that the mean_cost was the minimum at the 2nd epoch then it started increasing. Although I read in your paper that the number of epochs does not hurt the code representations very much, I am not sure which epoch should I choose after finished training. Should I used the minimum cost one, or the one from the last epoch?

@mp2893
Copy link
Owner

mp2893 commented May 9, 2020

Hi wywhu,

If you look at the code, masks are created during the training phase, so the mismatch between window_size and actual sequence length shouldn't be a problem. However, I wrote this code 4 years ago, so this is just speculation.

There is no fixed answer as to what number of epochs works best, as your dataset is different from what I had used. You can try to separate the cost into visit_cost and emb_cost (see line 133 of the source code), see how they behave, then select the epoch you like. This of course involves some coding.

Hope this helps,
Ed

@wywhu
Copy link
Author

wywhu commented Sep 4, 2020

Thanks Ed. I have another question about interpreting the code representations.

In your paper, it says that "we trained ReLU(W_c), a non-negative matrix, to represent the meaning of .......", and "we can find the top k code that have the largest values for the i-th coordinates by argsort(W_c[i, :])[1, k]".

I am confused, should I look at W_c or ReLU(W_c) in the argsort operation?

@mp2893
Copy link
Owner

mp2893 commented Sep 5, 2020

Actually, you are correct. You should look at ReLU(W_c) in the argsort operation, which guarantees non-negativity.
However, since all medical codes are trained in the non-negative space, I don't think the results would be too different.
But technically you should use ReLU(W_c). Thanks for pointing it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants