-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this PTQ or QAT? #1139
Comments
Brevitas supports both PTQ and QAT, and there is no difference in model/module instantiation whether you want to do PTQ or QAT. For QAT, just train your model as you would train any other PyTorch model. Of course there are more nuances to that but it mostly depends on your specific use case and I'm afraid we can't help too much with the fine level details. For PTQ, you can check here to see how to apply some of the PTQ algorithms supported in Brevitas. As said above, the same model can be used for both PTQ and QAT but some configurations are meant to perform better with QAT and other with PTQ, but it would be difficult to do a comprehensive list here. For low bit-width quantization, it is also possible to achieve good performance with PTQ but it requires more experimentation with settings and algorithms. QAT is definitely more reliable when it comes to low bit-width quantization but more time consuming. Hope this helps. |
I think this could be due to the Try adding |
Thanks for your quick reply and detailed explanation. |
Closing this for now. Feel free to reach out if you have further questions :) |
Hello, may I ask if my model only includes PTQ? For the current model, only bit>8 can produce results that are similar to non quantized networks, but I really need a lower bit network, preferably at 4 bits. Is QAT used in this model? If QAT is added, will it improve the model performance, and how should QAT be added.
The text was updated successfully, but these errors were encountered: