VAE part of model #61

fgvbrt · 2017-03-17T19:21:58Z

Hi, it looks like that this code actually train not VAE model but simple auto-encoder model. Here are reasons:

Epsilon std is 0.01 https://github.com/maxhodak/keras-molecules/blob/master/molecules/model.py#L58 when it should be 1. I assume that it is safe to say that there is almost no sampling.
KL loss should be very small because there is mean operation https://github.com/maxhodak/keras-molecules/blob/master/molecules/model.py#L78. In that case there is mean along feature and sequence shape. But both of this should be summed to obtain right KL loss relative to crossentropy loss.
The picture in readme also indicates that, because not all regions in latent space are covered by points. And authors wrote in paper that they observed this when they trained simple auto-encoder model.

May be it makes sense to simple train autoencoder model and compare results.

sbaurdlp · 2017-04-18T16:12:06Z

Hi,

I'm working on a similar problem but with protein sequences rather than molecules

You mention epsilon_std is not 1, which also seems quite strange to me
Yet, I found it was often the case in other codes (for example the Keras tutorial on VAE)
When I changed mine from 1.0 to 1e-3 some months ago, it allowed the model to learn (it didnt before)

Would you say VAE arent suited for that problem ?

Regards,
Sebastien

larry0x · 2018-10-23T22:43:36Z

Hi Sebastien, do you have any update one the issue regarding epsilon_std?

I am trying to implement the same model in PyTorch and encountered the save problem. If I set epsilon_std to 1, the model refuses to learn anything (loss stagnates at very high values.)

If I change this value to 0, the VAE effectively degenerates to a simple AE. It learns very fast, recovering input sequences almost perfectly. But just like with any other simple AEs, the latent space it produces is sparse and it generates garbage when interpolating or decoding randomly sampled latent variables.

If I pick small, non-zero epsilon_std values, the result is between the two scenarios - the model learns better than when epsilon_std is set to 1 but not as good as when it is set to zero. In none of the cases the model works as good as described in Aspuru-Guzik's paper.

chaoyan1037 · 2019-04-23T03:00:16Z

@lyu18
I encountered the same problem as you. And I looked into the code of original paper and found out that they anneal the epsilon_std. Maybe this can help the model training. I will try it quickly.

larry0x · 2019-04-23T21:21:42Z

@allenallen1037 That makes a lot of sense! Please let me know if you get any results. Thanks

chaoyan1037 · 2019-04-24T00:51:01Z

@lyu18 It helps to improve the reconstruction accuracy when training. This is expected since it is some kind of tradeoff between AE and VAE. But the KL divergence loss is quite large, which means the latent space may not be smooth. I will do more investigation when finishing training.

maxime-langevin · 2019-05-09T08:05:20Z

@allenallen1037 I've encountered the same problem as you. Did you found a workaround that helped you solve it? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAE part of model #61

VAE part of model #61

fgvbrt commented Mar 17, 2017 •

edited

Loading

sbaurdlp commented Apr 18, 2017

larry0x commented Oct 23, 2018

chaoyan1037 commented Apr 23, 2019

larry0x commented Apr 23, 2019

chaoyan1037 commented Apr 24, 2019

maxime-langevin commented May 9, 2019

VAE part of model #61

VAE part of model #61

Comments

fgvbrt commented Mar 17, 2017 • edited Loading

sbaurdlp commented Apr 18, 2017

larry0x commented Oct 23, 2018

chaoyan1037 commented Apr 23, 2019

larry0x commented Apr 23, 2019

chaoyan1037 commented Apr 24, 2019

maxime-langevin commented May 9, 2019

fgvbrt commented Mar 17, 2017 •

edited

Loading