You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the excellent collection and implementation of vector quantization techniques! It is very helpful for me to get to know about this technique and study the details.
I've integrated these techniques into my autoencoder and encountered some challenges. Initially, training the autoencoder without quantization yielded good results. However, introducing quantization methods such as residualFSQ and residualVQ adversely affected both training and validation losses, preventing them from reaching the levels achieved without quantization. Intriguingly, testing the quantization on a smaller subset of the data (0.8k out of 120k) yielded consistent results, with or without quantization. Yet, using the entire dataset resulted in persistently high and slowly decreasing training and validation losses, almost plateauing.
Upon examining the implementation, I noticed that the quantization mechanisms project the input features into significantly smaller dimensions (8, 16, or 32) before actual quantization occurs. I'm concerned this dimensionality reduction might compromise the model’s representational capacity, as any further quantization steps, such as binarization or scalar quantization, are confined to this limited-dimensional space.
Has anyone else experienced similar issues with quantization in autoencoders, and if so, how did you address them?
The text was updated successfully, but these errors were encountered:
This is certainly possible if you set the codebook dimension too low. However, if you set it right, the bottleneck be your codebook size, not the projection. I have found that it is pretty variable on your task and model, but a good proxy is probably how large your input vector is that you are trying to quantize. I'd check out the tables in this paper for examples of how much harder reconstruction becomes depending on codebook_dim: https://arxiv.org/pdf/2110.04627
Also, as a note, reconstruction should not be your only metric you are tracking (you should also track codebook usage, and try out regularization techniques like dropout). If you only focus on reconstruction, you might run into posterior collapse: https://www.geeksforgeeks.org/what-is-posterior-collapse-phenomenon/
Thanks, Harish! Your advice is very inspiring. I checked the table in that paper. It seems narrowing the latent dimension is helpful in codebook usage. I am curious if we have another way to increase the codebook usage (FSQ and LFQ), do we still need to use a low dimensional latent code to represent the image?
Thanks for the excellent collection and implementation of vector quantization techniques! It is very helpful for me to get to know about this technique and study the details.
I've integrated these techniques into my autoencoder and encountered some challenges. Initially, training the autoencoder without quantization yielded good results. However, introducing quantization methods such as residualFSQ and residualVQ adversely affected both training and validation losses, preventing them from reaching the levels achieved without quantization. Intriguingly, testing the quantization on a smaller subset of the data (0.8k out of 120k) yielded consistent results, with or without quantization. Yet, using the entire dataset resulted in persistently high and slowly decreasing training and validation losses, almost plateauing.
Upon examining the implementation, I noticed that the quantization mechanisms project the input features into significantly smaller dimensions (8, 16, or 32) before actual quantization occurs. I'm concerned this dimensionality reduction might compromise the model’s representational capacity, as any further quantization steps, such as binarization or scalar quantization, are confined to this limited-dimensional space.
Has anyone else experienced similar issues with quantization in autoencoders, and if so, how did you address them?
The text was updated successfully, but these errors were encountered: