-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat (equalize): enable rotation matrix optimization #1155
Feat (equalize): enable rotation matrix optimization #1155
Conversation
88f8373
to
0f74e0c
Compare
@@ -511,7 +516,7 @@ def find_module( | |||
Specifically, it allows to map nn.MultiheadAttetion to its quantized counterpart and not its | |||
Linear submodules. | |||
""" | |||
if _module_class_name(type(model)) in layer_map.keys(): | |||
if _module_class_name(type_before_parametrizations(model)) in layer_map.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we check for typing elsewhere in this file to apply quantization. Would you mind double checking that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a similar check in layer_handler
which seems to be only called for graph models. If I'm not missing something, do we want to have it there too? We currently don't have any use-case, as far as I'm aware, in which we have a graph model with parametrizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be cases with graph model + parametrizations in the future, so let's change it there as well so that it works.
Once we have a QuantLinear with a parametrization registered, do we still need to use this new function or can we fall back to type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you register the parametrization, the attribute class changes from QuantLinear to ParametrizedQuantLinear, so I guess it's still needed, as type
is going to return a class prefixed by Parametrized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a problem, since I assume it might break stuff like GPTQ. Would you mind checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But GPTQ modifies weights in-place right? I don't think that fits well with parametrised rotations, since the weights are no longer a static tensor in memory, but instead are dinamically computed by running the forward of the parametrization modules, passing as input the original weight tensor. My understanding is that we would need to fuse the rotations before applying GPTQ, and therefore, there won't be any problem as layers will be again QuantLinear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, there could be a world where you first do GPTQ and then optimize the rotations afterwards.
Let's add a comment/warning when adding parametrized rotation that type checking might break
06e5c1f
to
aefde7d
Compare
Reason for this PR
Changes Made in this PR
Testing Summary
Risk Highlight
Checklist
dev
branch.