Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to adopt GPTQ on Conv2d with groups attribute? #35

Open
TMYuan opened this issue Jul 14, 2023 · 1 comment
Open

How to adopt GPTQ on Conv2d with groups attribute? #35

TMYuan opened this issue Jul 14, 2023 · 1 comment

Comments

@TMYuan
Copy link

TMYuan commented Jul 14, 2023

Hi,

Thanks for your impressive work! It really helps me quantize lots of large models.
Recently, I try to implement GPTQ on grouped Conv2d layer, but the results seem to be not good.
Could you provide some hints to support GPTQ on grouped Conv2d?

Here is my rough implementation now:

  1. In add_batch function, divide inp into different group and store hessian respectively.
  2. In fasterquant function, divide W into different group and apply GPTQ with chunk of W and corresponding hessian.
  3. Concat the different groups of Q to full Q.

Thank you in advance.

@efrantar
Copy link
Member

Hi!

In general, your implementation sounds reasonable. One thing you can do for correctness verification is to check whether the squared errors reported by GPTQ (with very low dampening) match the actual squared error you measure on the quantized layer, using the same data.

However, one thing to note in general is that GPTQ is most effective if the Hessian is large as then there is a lot of room for error compensation when quantizing weights. For something like a 3x3 depthwise convolution, the Hessian is only 9x9 for each filter, hence GPTQ probably won't be very effective since there is only limited possibility for error compensation between weights. Perhaps this is also part of the problem you are seeing with groupwise convolutions. Fortunately, in practice, such layers are often quite small and it might thus make sense to just skip quantizing them (or quantize them to higher bitwidth).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants