Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input scale required #423

Closed
ardeal opened this issue Sep 23, 2022 · 13 comments
Closed

Input scale required #423

ardeal opened this issue Sep 23, 2022 · 13 comments

Comments

@ardeal
Copy link

ardeal commented Sep 23, 2022

Hi,

I am running the example in brevitas github page as follows.

I am experiencing the following issue:

Exception has occurred: RuntimeError
Input scale required
  File "/home/brevitas/proxy/parameter_quant.py", line 196, in forward
    raise RuntimeError("Input scale required")
  File "/home/brevitas/nn/quant_layer.py", line 357, in forward_impl
    quant_bias = self.bias_quant(self.bias, output_scale, output_bit_width)
  File "/home/brevitas/nn/quant_conv.py", line 225, in forward
    return self.forward_impl(input)
  File "/home/brevitas_example.py", line 31, in forward
    out = self.relu1(self.conv1(x))
  File "/home/brevitas_example.py", line 50, in <module>
    ret = lpln(timage)

what is 'Input scale required'?
how to scale the input? is there any example?
one more question is:
what is the request of input for the Layers in Brevitas? for example, qnn.QuanLinear, should I input float or int or uint8 to this layer?

class LowPrecisionLeNet(Module):
    def __init__(self):
        super(LowPrecisionLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=4, return_quant_tensor=True)
        self.conv1 = qnn.QuantConv2d(3, 6, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu1 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.conv2 = qnn.QuantConv2d(6, 16, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu2 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc1   = qnn.QuantLinear(16*5*5, 120, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu3 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc2   = qnn.QuantLinear(120, 84, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu4 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc3   = qnn.QuantLinear(84, 10, bias=False, weight_bit_width=3)

    def forward(self, x):
        # out = self.quant_inp(x)
        out = self.relu1(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = self.relu2(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.reshape(out.shape[0], -1)
        out = self.relu3(self.fc1(out))
        out = self.relu4(self.fc2(out))
        out = self.fc3(out)
        return out

if __name__ == '__main__':
    lpln = LowPrecisionLeNet()
    file = r"/home/image_1.jpg"
    # image = cv2.imread(file, -1)/255.0
    image = Image.open(file)
    image = np.array(image)/2550.0
    timage = torch.unsqueeze(torch.from_numpy(image).type(torch.float), 0)
    ret = lpln(timage)
@MohamedA95
Copy link

Hi,
This depends on which bias quantizer you use, most bias quantizers require the input to be quantized e.g. Int8Bias, Int16Bias, Int32Bias if you do not want to quantize the input you can use a quantizer like Int8BiasPerTensorFixedPointInternalScaling which does not require the input to be quantized.

@ardeal
Copy link
Author

ardeal commented Sep 23, 2022

@MohamedA95
Thank you!

I have tried my best to understand Brevitas code, but it is really hard to be understood.

Is there any document about Brevitas?
especially, it should be very useful if there is some document about the class structure of Brevitas

@MohamedA95
Copy link

Hi,
I am not sure what is the current state of the documentation as I am not part of the team behind brevitas. But there are a couple of notebooks here that are helpful. Also, the Gitter channel has many helpful answers. Furthermore, the ARCHITECTURE.md file is helpful. Overall I would advise you not to put too much effort into trying to understand where each parameter goes and who is it processed (unless this is your goal) as inheritance is heavily used in Brevitas, just trust that it works.

@ardeal
Copy link
Author

ardeal commented Sep 24, 2022

@MohamedA95
Thank you!

To be honest, my goal is to port the quantization code to C code and make it run in embedded MCU, so to understand how the code works is very important for me.

I will check the document you pointed out. Thank you again!

@MohamedA95
Copy link

Hi @ardeal,
In this case, have a look at this paper and this doc.

@ardeal
Copy link
Author

ardeal commented Sep 27, 2022

@MohamedA95
Many Thanks!

in fact, I previously read some quantization paper like: https://arxiv.org/pdf/2004.09602.pdf.

Currently, the question/problem for me is that I want to use brevitas and want to know how it works.

Please check the following test code.

Currently, my questions are:

  1. for QuantConv2d layer, I want to the input, weight, bias, and output to be quantized, how to do that?
  2. as weight_quant: Optional[WeightQuantType] = Int8WeightPerTensorFloat, to my understanding, the weight should be quantized to 8 bits data. however, I found out that in wq, the weights are still float?
  3. I would like to check the code how the float is quantized to int8, could you please tell the exact function/file where the quantization is done?
   default_quant_conv = QuantConv2d(in_channels=2, out_channels=3, kernel_size=(3, 3), bias=False)
    torch.nn.init.constant_(default_quant_conv.weight, 0.1)

    # input = torch.randn(1, 2, 5, 5)
    input = [[[1,2,3,4,5], [11,22,33,44,55], [12,13,14,15,16], [22,23,24,25,26], [33,34,35,36,37]],
    [[11,12,13,14,15], [21,22,31,41,51], [31,32,33,34,36], [42,43,44,45,46], [51,52,53,54,56]]]

    # input = np.ones((2,5,5)).tolist()
    input = torch.unsqueeze(torch.tensor(input,dtype=torch.float32), 0)/100.0
    out = default_quant_conv(input)

    wq= default_quant_conv.quant_weight()

@MohamedA95
Copy link

Hi,
AFAIK, brevitas follows the same quantization scheme as the paper I shared previously.
Regarding your questions:

  1. In each layer you can define, weight_quant, bias_quant, input_quant, output_quant. Some layers have default quantizers for example the conv layer uses Int8WeightPerTensorFloat as the default weight quantizer, check here.
  2. Yes, as far as I know, there is no fixed point data type in python. Brevitas saves the parameters in float and exposes 3 functions to access the weights .weight original weights .quant_weight these are the quantized weights in an unquantized format that is float, .int_weight those are the quantized weights that you are looking for. The way you interpret them depends on the type of quantization either integer or fixed point.
  3. I do not know.

@ardeal
Copy link
Author

ardeal commented Sep 28, 2022

Hi @MohamedA95
Thank you!

for the above question 2), I get a further question:
as you said, we could access .weight, .quant_weight and .int_weight to get different format of weight. the question is in QuantConv2D layer, what type of data is used for calculation? does it depends on what we set for weight_quant?

for example, if we set weight_quant to Int8WeightPerTensorFloat, the int8 format of weight is used for calculation in the layer, is my understanding correct?

@MohamedA95
Copy link

I am not sure what type is actually used but as far as I understand it can be float. You do not have to use int8 to get the values in the range of int8. In other words, you can do the quantization using the following equation Real_value=Scale*Quant_Value-Zero_point in float and then clip the values and cast the result to int. Also, AFAIK python does not support variable bit_width data type.

@KeciaHH
Copy link

KeciaHH commented Mar 30, 2023

Hi have you solved this problem:
RuntimeError: Input scale required
As I met the same here
Thanks!

@MohamedA95
Copy link

Hi @KokyoK,
Usually, this error results from using a bias quantizer that requires the input to be quantized - so it has a scale factor-. quantizers like Int8Bias require the input to be quantized you can either quantize the input or use a bias quantizer that does not require an input scale like Int8BiasPerTensorFloatInternalScaling. For more info check this tutorial.

@KeciaHH
Copy link

KeciaHH commented Apr 3, 2023

Hi @KokyoK, Usually, this error results from using a bias quantizer that requires the input to be quantized - so it has a scale factor-. quantizers like Int8Bias require the input to be quantized you can either quantize the input or use a bias quantizer that does not require an input scale like Int8BiasPerTensorFloatInternalScaling. For more info check this tutorial.

Problem solved, thanks so much.

@hkayann
Copy link

hkayann commented Feb 5, 2025

I also get this error, when I provide the below arguments:

python brevitas-master/src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py \
  --calibration-dir calibration_data \
  --validation-dir validation_data \
  --dataset cifar10 \
  --dtype float \
  --model-name mobilenet_v3_small \
  --export-onnx-qcdq \
  --calibration-samples 500 \
  --batch-size-calibration 512 \
  --batch-size-validation 512 \
  --weight-bit-width 16 \
  --act-bit-width 16 \
  --quant-format float \
  --weight-mantissa-bit-width 10 \
  --weight-exponent-bit-width 5 \
  --act-mantissa-bit-width 10 \
  --act-exponent-bit-width 5 \
  --no-bias-corr \
  --compile \
  --gpu 0

I can see that, some scale are available, but issue arises when validation starts:

QuantIdentity cached scale: 7.420690963044763e-05
QuantIdentity cached scale: 0.000315548648359254
QuantIdentity cached scale: 7.672010542592034e-05
QuantIdentity cached scale: 4.625794827006757e-05
QuantIdentity cached scale: 9.576442607794888e-06
QuantIdentity cached scale: 7.636739610461518e-05
Starting validation:

and then we get the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants