You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your impressive work! It really helps me quantize lots of large models.
Recently, I try to implement GPTQ on grouped Conv2d layer, but the results seem to be not good.
Could you provide some hints to support GPTQ on grouped Conv2d?
Here is my rough implementation now:
In add_batch function, divide inp into different group and store hessian respectively.
In fasterquant function, divide W into different group and apply GPTQ with chunk of W and corresponding hessian.
Concat the different groups of Q to full Q.
Thank you in advance.
The text was updated successfully, but these errors were encountered:
In general, your implementation sounds reasonable. One thing you can do for correctness verification is to check whether the squared errors reported by GPTQ (with very low dampening) match the actual squared error you measure on the quantized layer, using the same data.
However, one thing to note in general is that GPTQ is most effective if the Hessian is large as then there is a lot of room for error compensation when quantizing weights. For something like a 3x3 depthwise convolution, the Hessian is only 9x9 for each filter, hence GPTQ probably won't be very effective since there is only limited possibility for error compensation between weights. Perhaps this is also part of the problem you are seeing with groupwise convolutions. Fortunately, in practice, such layers are often quite small and it might thus make sense to just skip quantizing them (or quantize them to higher bitwidth).
Hi,
Thanks for your impressive work! It really helps me quantize lots of large models.
Recently, I try to implement GPTQ on grouped Conv2d layer, but the results seem to be not good.
Could you provide some hints to support GPTQ on grouped Conv2d?
Here is my rough implementation now:
add_batch
function, divideinp
into different group and store hessian respectively.fasterquant
function, divideW
into different group and apply GPTQ with chunk ofW
and corresponding hessian.Q
to fullQ
.Thank you in advance.
The text was updated successfully, but these errors were encountered: