Feat (equalize): enable rotation matrix optimization #1155

pablomlago · 2025-01-14T17:11:14Z

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

This PR includes code from another work (please detail).
This PR contains API-breaking changes.
This PR depends on work in another PR (please provide links/details).
This PR introduces new dependencies (please detail).
There are coverage gaps not covered by tests.
Documentation updates required in subsequent PR.

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

src/brevitas/graph/base.py

src/brevitas/graph/quantize_impl.py

tests/brevitas_examples/test_llm.py

Giuseppe5 · 2025-01-16T12:48:43Z

src/brevitas/graph/quantize_impl.py

@@ -511,7 +516,7 @@ def find_module(
    Specifically, it allows to map nn.MultiheadAttetion to its quantized counterpart and not its
    Linear submodules.
    """
-    if _module_class_name(type(model)) in layer_map.keys():
+    if _module_class_name(type_before_parametrizations(model)) in layer_map.keys():


I suspect we check for typing elsewhere in this file to apply quantization. Would you mind double checking that?

There's a similar check in layer_handler which seems to be only called for graph models. If I'm not missing something, do we want to have it there too? We currently don't have any use-case, as far as I'm aware, in which we have a graph model with parametrizations.

There might be cases with graph model + parametrizations in the future, so let's change it there as well so that it works.

Once we have a QuantLinear with a parametrization registered, do we still need to use this new function or can we fall back to type?

When you register the parametrization, the attribute class changes from QuantLinear to ParametrizedQuantLinear, so I guess it's still needed, as type is going to return a class prefixed by Parametrized.

That could be a problem, since I assume it might break stuff like GPTQ. Would you mind checking?

But GPTQ modifies weights in-place right? I don't think that fits well with parametrised rotations, since the weights are no longer a static tensor in memory, but instead are dinamically computed by running the forward of the parametrization modules, passing as input the original weight tensor. My understanding is that we would need to fuse the rotations before applying GPTQ, and therefore, there won't be any problem as layers will be again QuantLinear.

Not sure, there could be a world where you first do GPTQ and then optimize the rotations afterwards.
Let's add a comment/warning when adding parametrized rotation that type checking might break

src/brevitas/graph/base.py

src/brevitas_examples/llm/main.py

tests/brevitas_examples/test_llm.py

src/brevitas_examples/llm/main.py

tests/brevitas_examples/llm_test_template.yml

src/brevitas/graph/equalize.py

src/brevitas_examples/llm/main.py

pablomlago marked this pull request as ready for review January 15, 2025 17:13

pablomlago force-pushed the feat-rotation-optim-integration branch from 88f8373 to 0f74e0c Compare January 16, 2025 11:08

Giuseppe5 reviewed Jan 16, 2025

View reviewed changes

src/brevitas/graph/base.py Outdated Show resolved Hide resolved

Giuseppe5 reviewed Jan 16, 2025

View reviewed changes

src/brevitas/graph/quantize_impl.py Outdated Show resolved Hide resolved

Giuseppe5 reviewed Jan 16, 2025

View reviewed changes

tests/brevitas_examples/test_llm.py Outdated Show resolved Hide resolved

Giuseppe5 reviewed Jan 16, 2025

View reviewed changes

src/brevitas/graph/base.py Outdated Show resolved Hide resolved

Giuseppe5 reviewed Jan 17, 2025

View reviewed changes