Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this PTQ or QAT? #1139

Closed
24367452 opened this issue Dec 24, 2024 · 5 comments
Closed

Is this PTQ or QAT? #1139

24367452 opened this issue Dec 24, 2024 · 5 comments

Comments

@24367452
Copy link

Hello, may I ask if my model only includes PTQ? For the current model, only bit>8 can produce results that are similar to non quantized networks, but I really need a lower bit network, preferably at 4 bits. Is QAT used in this model? If QAT is added, will it improve the model performance, and how should QAT be added.

class QuantWeightLeNet(Module):
    def __init__(self, bit=8):
        super(QuantWeightLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=bit, return_quant_tensor=True)
        self.conv1 = qnn.QuantConv1d(in_channels=1, out_channels=128, kernel_size=3, padding=1,bias=True, weight_bit_width=bit, bias_quant=Int32Bias)
        self.relu1 = qnn.QuantReLU(bit_width=bit, return_quant_tensor=True)
        self.conv2 = qnn.QuantConv1d(in_channels=128, out_channels=64, kernel_size=3, padding=1,bias=True, weight_bit_width=bit, bias_quant=Int32Bias)
        self.relu2 = qnn.QuantReLU(bit_width=bit, return_quant_tensor=True)
        self.conv3 = qnn.QuantConv1d(in_channels=64, out_channels=32, kernel_size=3, padding=1,bias=True, weight_bit_width=bit, bias_quant=Int32Bias)
        self.relu3 = qnn.QuantReLU(bit_width=bit, return_quant_tensor=True)
        self.fc1 = qnn.QuantLinear(32*4, 2, bias=True, weight_bit_width=bit, bias_quant=Int32Bias)

    def forward(self, x):
        out = self.quant_inp(x)
        out = self.relu1(self.conv1(out))
        out = self.relu2(self.conv2(out))
        out = self.relu3(self.conv3(out))
        out = out.reshape(out.shape[0], -1)
        out = self.fc1(out)
        return out
@Giuseppe5
Copy link
Collaborator

Brevitas supports both PTQ and QAT, and there is no difference in model/module instantiation whether you want to do PTQ or QAT.

For QAT, just train your model as you would train any other PyTorch model. Of course there are more nuances to that but it mostly depends on your specific use case and I'm afraid we can't help too much with the fine level details.

For PTQ, you can check here to see how to apply some of the PTQ algorithms supported in Brevitas.

As said above, the same model can be used for both PTQ and QAT but some configurations are meant to perform better with QAT and other with PTQ, but it would be difficult to do a comprehensive list here.
Also, PTQ works best if you start from a pre-trained floating point model, while QAT works also if you start training a model from scratch.

For low bit-width quantization, it is also possible to achieve good performance with PTQ but it requires more experimentation with settings and algorithms. QAT is definitely more reliable when it comes to low bit-width quantization but more time consuming.

Hope this helps.

@24367452
Copy link
Author

Thank you for your answer, I have successfully used QAT!
I set the quantized weight to 4 bits, which should have 16 states, but it only has 15 different values. Why is this?
f4b880e98600a085be90fb3817e3645

@Giuseppe5
Copy link
Collaborator

Giuseppe5 commented Dec 27, 2024

I think this could be due to the narrow_range flag which removes the leftmost negative value so that the quantization range is symmetric around 0 (e.g., for 4 bit quantization, if narrow_range=True, quantization maps floating point numbers to 15 values from -7 to +7. Setting narrow_range to false changes the mapping to -8 to +7.)

Try adding weight_narrow_range=False to the layer as keyword argument and see if this makes a difference.

@24367452
Copy link
Author

Thanks for your quick reply and detailed explanation.
I have understood the role of narrow_range

@Giuseppe5
Copy link
Collaborator

Closing this for now. Feel free to reach out if you have further questions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants