From 8f26c8499c88650421b57a7e7f86b518e76480a7 Mon Sep 17 00:00:00 2001 From: Giuseppe Franco Date: Tue, 20 Aug 2024 13:19:39 +0100 Subject: [PATCH] Updated notebooks --- notebooks/minifloat_mx_tutorial.ipynb | 89 ++++++++++++++++++++++++++- 1 file changed, 87 insertions(+), 2 deletions(-) diff --git a/notebooks/minifloat_mx_tutorial.ipynb b/notebooks/minifloat_mx_tutorial.ipynb index cf11e5d36..da15aac5d 100644 --- a/notebooks/minifloat_mx_tutorial.ipynb +++ b/notebooks/minifloat_mx_tutorial.ipynb @@ -4,14 +4,59 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Minifloat and MX Example" + "# Minifloat and Groupwise quantization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Work in progress examples to show how to use minifloat and MX with Brevitas" + "This notebook shows some practical use cases for minifloat and groupwise quantization.\n", + "\n", + "Brevitas supports a wide combination of float quantization, including the OCP and FNUZ FP8 standard.\n", + "It is possible to define any combination of exponent/mantissa bitwidth, as well as exponent bias.\n", + "\n", + "Similarly, MX quantization is supported as general groupwise quantization on top of integer/minifloat datatype.\n", + "This allows to any general groupwise quantization, including MXInt and MXFloat standards.\n", + "\n", + "This tutorial shows how to instantiate and use some of the most interesting quantizers for minifloat and groupwise quantization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Minifloat (FP8 and lower)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Brevitas offers some pre-defined quantizers for minifloat quantization, including OCP and FNUZ standards, which can be further customized according to the specific use case.\n", + "The general naming structure for the quantizers is the following:\n", + "\n", + "`Fp\\\\Weight\\Float`\n", + "\n", + "Where `Bitwidth` can be either empty or `e4m3`/`e5m2`, `Standard` can be empty or `OCP`/`FNUZ`, `Scaling` can be empty or `PerTensor`/`PerChannel`.\n", + "\n", + "If `Bitwidth` is empty, the user must set it with kwargs or by subclassing the quantizers. Once the bitwidth is defined, the correct values for inf/nan are automatically defined based on the `Standard`.\n", + "If a non-valid OCP bitwidth is set (e.g., e6m1), then no inf/nan values will be selected and the corresponding quantizer is not standard-compliant.\n", + "\n", + "`Standard` allows to pick among the two main FP8 standard options; moreover, Brevitas offers the possibility of doing minifloat quantization without necessarily reserving values for inf/nan representation.\n", + "This allows to use the maximum available range, since often in quantization, values that exceed the quantization range saturate to maximum rather than going to inf/nan.\n", + "To use this third class of minifloat quantizers, \n", + "FNUZ quantizers need to have `saturating=True`.\n", + "\n", + "The `Scaling` options defines whether the quantization is _scaled_ or _unscaled_.\n", + "In the unscaled case, the scale factor for quantization is fixed to one, otherwise it can be set using any of the methods that Brevitas includes (e.g., statistics, learned, etc.)\n", + "\n", + "\n", + "Please keep in mind that not all combinations of the above options might be pre-defined and this serves mostly as indications of what Brevitas supports.\n", + "It is possible, following the same structure of the available quantizers, to define new ones that fit your needs.\n", + "\n", + "\n", + "Similar considerations can be extended for activation quantization." ] }, { @@ -63,6 +108,46 @@ "assert isinstance(intermediate_input, FloatQuantTensor)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Groupwise quantization (MXInt/MXFloat)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Groupwise quantization is built on top of integer/minifloat quantization, with special considerations to accomodate for the groupwise scaling.\n", + "\n", + "Compared to Int/Float QuantTensor, the main difference of their groupwise equivalent is that value, scale, and zero_point are not direct attributes anymore but properties. The new attributes are value_, scale_, and zero_point_.\n", + "\n", + "The reason for this is shaping. When quantizing a tensor with shapes [O, I], where O is output channel and I is input channel, with groupsize k, groupwise quantization is normally represented as follow:\n", + "\n", + "- Tensor with shapes [O, k, I/k]\n", + "- Scales with shapes [O, k, 1]\n", + "- Zero point same as scale\n", + "\n", + "The alternative to this representation is to have all three tensors with shapes [O,I], with a massive increase in memory utilization, especially with QAT + gradients.\n", + "\n", + "The underscored attributes will have the compressed shapes, while the properties (non-underscored naming) will dynamically compute the expanded version of the property. This means:\n", + "```python\n", + "quant_tensor.scale_.shape\n", + "# This will print [O, k, 1]\n", + "quant_tensor.scale.shape\n", + "# This will print [O, I]\n", + "```\n", + "\n", + "With respect to pre-defined quantizers, Brevitas offers several Groupwise and MX options.\n", + "The meain difference between the two is that MX is restricted to group_size=32 and the scale factor must be a power-of-2.\n", + "The user can override these settings but the corresponding output won't be MX compliant.\n", + "\n", + "Another difference is that MXFloat relies on the OCP format as underlying data type, while generic groupwise float relies on the non-standard minifloat representation explained above.\n", + "\n", + "Finally, the general groupwise scaling relies on float scales." + ] + }, { "cell_type": "code", "execution_count": 2,