Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat (equalize): enable rotation matrix optimization #1155

Merged
merged 26 commits into from
Jan 28, 2025

Conversation

pablomlago
Copy link
Collaborator

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

@pablomlago pablomlago marked this pull request as ready for review January 15, 2025 17:13
@pablomlago pablomlago force-pushed the feat-rotation-optim-integration branch from 88f8373 to 0f74e0c Compare January 16, 2025 11:08
@@ -511,7 +516,7 @@ def find_module(
Specifically, it allows to map nn.MultiheadAttetion to its quantized counterpart and not its
Linear submodules.
"""
if _module_class_name(type(model)) in layer_map.keys():
if _module_class_name(type_before_parametrizations(model)) in layer_map.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we check for typing elsewhere in this file to apply quantization. Would you mind double checking that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar check in layer_handler which seems to be only called for graph models. If I'm not missing something, do we want to have it there too? We currently don't have any use-case, as far as I'm aware, in which we have a graph model with parametrizations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be cases with graph model + parametrizations in the future, so let's change it there as well so that it works.

Once we have a QuantLinear with a parametrization registered, do we still need to use this new function or can we fall back to type?

Copy link
Collaborator Author

@pablomlago pablomlago Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you register the parametrization, the attribute class changes from QuantLinear to ParametrizedQuantLinear, so I guess it's still needed, as type is going to return a class prefixed by Parametrized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a problem, since I assume it might break stuff like GPTQ. Would you mind checking?

Copy link
Collaborator Author

@pablomlago pablomlago Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But GPTQ modifies weights in-place right? I don't think that fits well with parametrised rotations, since the weights are no longer a static tensor in memory, but instead are dinamically computed by running the forward of the parametrization modules, passing as input the original weight tensor. My understanding is that we would need to fuse the rotations before applying GPTQ, and therefore, there won't be any problem as layers will be again QuantLinear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, there could be a world where you first do GPTQ and then optimize the rotations afterwards.
Let's add a comment/warning when adding parametrized rotation that type checking might break

@pablomlago pablomlago force-pushed the feat-rotation-optim-integration branch from 06e5c1f to aefde7d Compare January 21, 2025 12:17
@Giuseppe5 Giuseppe5 self-requested a review January 23, 2025 12:49
@Giuseppe5 Giuseppe5 requested review from Giuseppe5 and removed request for Giuseppe5 January 27, 2025 10:39
@Giuseppe5 Giuseppe5 requested review from Giuseppe5 and removed request for Giuseppe5 January 27, 2025 13:41
@Giuseppe5 Giuseppe5 requested review from Giuseppe5 and removed request for Giuseppe5 January 27, 2025 14:14
@Giuseppe5 Giuseppe5 merged commit 024d4a7 into Xilinx:dev Jan 28, 2025
391 of 396 checks passed
@pablomlago pablomlago linked an issue Jan 31, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimized rotation matrix
2 participants