diff --git a/notebooks/03_anatomy_of_a_quantizer.ipynb b/notebooks/03_anatomy_of_a_quantizer.ipynb index 04b76a231..33a0d3cf7 100644 --- a/notebooks/03_anatomy_of_a_quantizer.ipynb +++ b/notebooks/03_anatomy_of_a_quantizer.ipynb @@ -11,7 +11,8 @@ "## What's in a Quantizer? \n", "\n", "In a broad sense, a quantizer is anything that implements a quantization technique, and the flexibility of Brevitas means that there are different ways to do so. \n", - "However, to keep our terminology straight, we refer to a quantizer as a specific kind of way to implement quantization, the one preferred and adopted by default. That is, a quantizer is a subclass of a `brevitas.inject.ExtendedInjector` that carries a `tensor_quant` attribute. \n", + "However, to keep our terminology straight, we refer to a quantizer as a specific kind of way to implement quantization, the one preferred and adopted by default. That is, a quantizer is a subclass of a `brevitas.inject.ExtendedInjector` that carries a `tensor_quant` attribute, which points to an instance of a torch `Module` that implements quantization.\n", + "\n", "We have seen in previous tutorials quantizers being imported from `brevitas.quant` and passed on to quantized layers. We can easily very what we just said on one of them:" ] }, @@ -112,7 +113,7 @@ "\n", "We need then a way to *declare* a quantization method such that it can be re-initialized and JIT compiled any time the `state_dict` changes. Because we want to support arbitrarly-complex user-defined quantization algorithms, this method has to be generic, i.e. it cannot depend on the specifics of the quantization algorithm implemented.\n", "\n", - "Implementing a quantizer with an `ExtendedInjector` is a way to do so. Specifically, an `ExtendedInjector` extends an `Injector` from an older version (0.2.1) of the excellent dependency-injection *[dependencies](https://github.com/proofit404/dependencies)* library with support for a couple of extra features that are specific to Brevitas' needs.\n", + "Implementing a quantizer with an `ExtendedInjector` is a way to do so. Specifically, an `ExtendedInjector` extends an `Injector` from an older version (*0.2.1*) of the excellent dependency-injection library *[dependencies](https://github.com/proofit404/dependencies)* with support for a couple of extra features that are specific to Brevitas' needs.\n", "\n", "An `Injector` (and an `ExtendedInjector`) allows to take what might be a very complicated graph of interwined objects and turns it into a flat list of variables that are capable of auto-assembly by matching variable names to arguments names. This technique typically goes under the name of auto-wiring dependency injection. \n", "\n", @@ -252,10 +253,10 @@ { "data": { "text/plain": [ - "(tensor([[-0.1000, 0.1000, -0.1000, -0.1000],\n", - " [ 0.1000, 0.1000, 0.1000, 0.1000],\n", - " [ 0.1000, 0.1000, -0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, 0.1000, 0.1000]], grad_fn=),\n", + "(tensor([[ 0.1000, 0.1000, 0.1000, 0.1000],\n", + " [-0.1000, -0.1000, 0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000, 0.1000],\n", + " [-0.1000, -0.1000, 0.1000, 0.1000]], grad_fn=),\n", " tensor(0.1000, grad_fn=),\n", " tensor(0.),\n", " tensor(1.))" @@ -298,9 +299,9 @@ "data": { "text/plain": [ "(tensor([[ 0.1000, 0.1000, -0.1000, 0.1000],\n", - " [ 0.1000, 0.1000, 0.1000, -0.1000],\n", - " [-0.1000, -0.1000, -0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, -0.1000, 0.1000]], grad_fn=),\n", + " [ 0.1000, -0.1000, -0.1000, -0.1000],\n", + " [ 0.1000, -0.1000, -0.1000, -0.1000],\n", + " [-0.1000, -0.1000, 0.1000, -0.1000]], grad_fn=),\n", " tensor(0.1000, grad_fn=),\n", " tensor(0.),\n", " tensor(1.))" @@ -347,10 +348,10 @@ { "data": { "text/plain": [ - "(tensor([[ 1., -1., 1., -1.],\n", - " [ 1., -1., -1., 1.],\n", - " [ 1., -1., 1., 1.],\n", - " [-1., -1., 1., -1.]], grad_fn=),\n", + "(tensor([[ 1., -1., -1., -1.],\n", + " [-1., 1., 1., -1.],\n", + " [ 1., -1., 1., -1.],\n", + " [-1., -1., -1., 1.]], grad_fn=),\n", " tensor(1., grad_fn=),\n", " tensor(0.),\n", " tensor(1.))" @@ -384,26 +385,10 @@ { "data": { "text/plain": [ - "(tensor([[-0.1000, 0.1000, -0.1000, -0.1000, 0.1000, 0.1000, 0.1000, 0.1000,\n", - " 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, 0.1000, 0.1000, -0.1000, 0.1000, -0.1000, -0.1000,\n", - " 0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, -0.1000, 0.1000, 0.1000, -0.1000, 0.1000, -0.1000,\n", - " -0.1000, -0.1000],\n", - " [-0.1000, -0.1000, -0.1000, -0.1000, -0.1000, 0.1000, -0.1000, 0.1000,\n", - " -0.1000, -0.1000],\n", - " [ 0.1000, 0.1000, -0.1000, 0.1000, 0.1000, 0.1000, -0.1000, 0.1000,\n", - " 0.1000, -0.1000],\n", - " [ 0.1000, -0.1000, -0.1000, -0.1000, 0.1000, -0.1000, -0.1000, -0.1000,\n", - " 0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, -0.1000, 0.1000, 0.1000, -0.1000, 0.1000, 0.1000,\n", - " 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, -0.1000, -0.1000, 0.1000, 0.1000, -0.1000, -0.1000,\n", - " 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, 0.1000, -0.1000, -0.1000, -0.1000, 0.1000, -0.1000,\n", - " 0.1000, 0.1000],\n", - " [-0.1000, 0.1000, 0.1000, -0.1000, -0.1000, -0.1000, -0.1000, -0.1000,\n", - " -0.1000, -0.1000]], grad_fn=),\n", + "(tensor([[ 0.1000, 0.1000, -0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, -0.1000, 0.1000],\n", + " [ 0.1000, -0.1000, -0.1000, -0.1000]], grad_fn=),\n", " tensor(0.1000, grad_fn=),\n", " tensor(0.),\n", " tensor(1.))" @@ -426,7 +411,7 @@ " pass\n", "\n", "comp_inj_tensor_quant = MyComposedBinaryQuantizer.tensor_quant\n", - "comp_inj_tensor_quant(torch.randn(10, 10))" + "comp_inj_tensor_quant(torch.randn(4, 4))" ] }, { @@ -446,30 +431,30 @@ { "data": { "text/plain": [ - "QuantTensor(value=tensor([[[[ 0.1000, -0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, 0.1000],\n", - " [-0.1000, -0.1000, 0.1000]],\n", + "QuantTensor(value=tensor([[[[ 0.1000, 0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000]],\n", "\n", - " [[-0.1000, -0.1000, 0.1000],\n", - " [-0.1000, 0.1000, -0.1000],\n", - " [ 0.1000, 0.1000, -0.1000]],\n", + " [[-0.1000, 0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000],\n", + " [ 0.1000, -0.1000, -0.1000]],\n", "\n", - " [[ 0.1000, 0.1000, -0.1000],\n", - " [-0.1000, -0.1000, -0.1000],\n", - " [-0.1000, 0.1000, -0.1000]]],\n", + " [[-0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, 0.1000],\n", + " [-0.1000, -0.1000, -0.1000]]],\n", "\n", "\n", - " [[[ 0.1000, 0.1000, 0.1000],\n", + " [[[ 0.1000, 0.1000, -0.1000],\n", " [-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, 0.1000, 0.1000]],\n", + " [ 0.1000, 0.1000, 0.1000]],\n", "\n", - " [[-0.1000, -0.1000, -0.1000],\n", - " [ 0.1000, -0.1000, -0.1000],\n", - " [-0.1000, 0.1000, 0.1000]],\n", + " [[ 0.1000, -0.1000, 0.1000],\n", + " [-0.1000, -0.1000, -0.1000],\n", + " [-0.1000, 0.1000, -0.1000]],\n", "\n", - " [[ 0.1000, -0.1000, -0.1000],\n", - " [-0.1000, 0.1000, 0.1000],\n", - " [ 0.1000, 0.1000, 0.1000]]]], grad_fn=), scale=tensor(0.1000, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=None, training=True)" + " [[ 0.1000, -0.1000, 0.1000],\n", + " [-0.1000, -0.1000, 0.1000],\n", + " [ 0.1000, -0.1000, 0.1000]]]], grad_fn=), scale=tensor(0.1000, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=None, training=True)" ] }, "execution_count": 10, @@ -527,30 +512,30 @@ { "data": { "text/plain": [ - "QuantTensor(value=tensor([[[[-0.1000, -0.1000, -0.1000],\n", - " [ 0.1000, 0.1000, -0.1000],\n", - " [ 0.1000, -0.1000, 0.1000]],\n", - "\n", - " [[-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, 0.1000],\n", - " [ 0.1000, -0.1000, 0.1000]],\n", - "\n", - " [[-0.1000, 0.1000, -0.1000],\n", - " [-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, 0.1000, -0.1000]]],\n", - "\n", + "QuantTensor(value=tensor([[[[-0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, -0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, -0.1000]],\n", "\n", - " [[[-0.1000, -0.1000, -0.1000],\n", - " [-0.1000, 0.1000, -0.1000],\n", + " [[ 0.1000, -0.1000, -0.1000],\n", + " [ 0.1000, -0.1000, -0.1000],\n", " [ 0.1000, -0.1000, -0.1000]],\n", "\n", " [[-0.1000, -0.1000, 0.1000],\n", + " [-0.1000, 0.1000, -0.1000],\n", + " [-0.1000, -0.1000, 0.1000]]],\n", + "\n", + "\n", + " [[[-0.1000, 0.1000, 0.1000],\n", " [-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, -0.1000]],\n", + " [ 0.1000, -0.1000, 0.1000]],\n", "\n", " [[-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, 0.1000, 0.1000],\n", - " [-0.1000, -0.1000, -0.1000]]]], grad_fn=), scale=tensor(0.1000, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" + " [-0.1000, 0.1000, -0.1000],\n", + " [-0.1000, 0.1000, 0.1000]],\n", + "\n", + " [[-0.1000, -0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, -0.1000],\n", + " [-0.1000, -0.1000, 0.1000]]]], grad_fn=), scale=tensor(0.1000, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" ] }, "execution_count": 12, @@ -575,7 +560,7 @@ { "data": { "text/plain": [ - "True" + "False" ] }, "execution_count": 13, @@ -593,7 +578,7 @@ "source": [ "And now the quant weights are valid.\n", "\n", - "When we want to add or override an single attribute of a quantizer passed to a layer, defining a whole new quantizer can be too verbose. There is a simpler syntax to achieve the same goal. Let's say we want to have `scaling_init = 0.0001`. We can simply do the following:" + "When we want to add or override an single attribute of a quantizer passed to a layer, defining a whole new quantizer can be too verbose. There is a simpler syntax to achieve the same goal. Let's say we want to have add the `signed` attribute to `MyBinaryQuantizer`, as we just did. We could have also simply done the following:" ] }, { @@ -604,30 +589,30 @@ { "data": { "text/plain": [ - "QuantTensor(value=tensor([[[[-0.0010, 0.0010, -0.0010],\n", - " [-0.0010, -0.0010, -0.0010],\n", - " [-0.0010, 0.0010, -0.0010]],\n", + "QuantTensor(value=tensor([[[[ 0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, 0.1000],\n", + " [ 0.1000, -0.1000, -0.1000]],\n", "\n", - " [[ 0.0010, -0.0010, -0.0010],\n", - " [-0.0010, 0.0010, -0.0010],\n", - " [-0.0010, 0.0010, -0.0010]],\n", + " [[-0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, 0.1000, 0.1000],\n", + " [-0.1000, -0.1000, -0.1000]],\n", "\n", - " [[ 0.0010, -0.0010, -0.0010],\n", - " [-0.0010, -0.0010, 0.0010],\n", - " [ 0.0010, 0.0010, 0.0010]]],\n", + " [[-0.1000, -0.1000, 0.1000],\n", + " [-0.1000, 0.1000, -0.1000],\n", + " [ 0.1000, -0.1000, 0.1000]]],\n", "\n", "\n", - " [[[-0.0010, 0.0010, 0.0010],\n", - " [ 0.0010, 0.0010, -0.0010],\n", - " [-0.0010, -0.0010, 0.0010]],\n", + " [[[ 0.1000, -0.1000, -0.1000],\n", + " [-0.1000, 0.1000, 0.1000],\n", + " [-0.1000, 0.1000, -0.1000]],\n", "\n", - " [[-0.0010, -0.0010, 0.0010],\n", - " [-0.0010, 0.0010, -0.0010],\n", - " [ 0.0010, 0.0010, -0.0010]],\n", + " [[ 0.1000, -0.1000, 0.1000],\n", + " [-0.1000, 0.1000, 0.1000],\n", + " [ 0.1000, -0.1000, -0.1000]],\n", "\n", - " [[-0.0010, -0.0010, 0.0010],\n", - " [ 0.0010, 0.0010, 0.0010],\n", - " [-0.0010, -0.0010, -0.0010]]]], grad_fn=), scale=tensor(0.0010, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" + " [[ 0.1000, -0.1000, -0.1000],\n", + " [-0.1000, -0.1000, -0.1000],\n", + " [-0.1000, -0.1000, 0.1000]]]], grad_fn=), scale=tensor(0.1000, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" ] }, "execution_count": 14, @@ -636,7 +621,7 @@ } ], "source": [ - "small_scale_quant_conv = QuantConv2d(3, 2, (3,3), weight_quant=MySignedBinaryQuantizer, weight_scaling_init=0.001)\n", + "small_scale_quant_conv = QuantConv2d(3, 2, (3,3), weight_quant=MyBinaryQuantizer, weight_signed=True)\n", "small_scale_quant_conv.quant_weight()" ] }, @@ -644,7 +629,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "What we did was to take the name of the attribute `scaling_init`, add the prefix `weight_`, and pass it as a keyword argument to `QuantConv2d`. What happens in the background is that the keyword arguments prefixed with `weight_` are set as attributes of `weight_quant`, possibly overriding any pre-existing value. The same principle applies to `input_`, `output_` and `bias_`. In case the corrisponding quantizer is `None`, a new *empty* `ExtendedInjector` is internally allocated and attributes are passed to that. So for example we can define `output_quant` (which by default is None) completely *inline* without defining an explicit standalone quantizer: " + "What we did was to take the name of the attribute `scaling_init`, add the prefix `weight_`, and pass it as a keyword argument to `QuantConv2d`. What happens in the background is that the keyword arguments prefixed with `weight_` are set as attributes of `weight_quant`, possibly overriding any pre-existing value. The same principle applies to `input_`, `output_` and `bias_`.\n", + "In case the corrisponding quantizer is `None`, a new *empty* `ExtendedInjector` is internally allocated and keyword arguments are passed as attributes to that.\n", + "\n", + "So for example we can define `output_quant` (which by default is None) completely *inline* without defining an explicit standalone quantizer:" ] }, { @@ -655,13 +643,13 @@ { "data": { "text/plain": [ - "tensor([[[[ 1., -1., 1.],\n", + "tensor([[[[-1., -1., -1.],\n", " [-1., 1., -1.],\n", - " [ 1., 1., -1.]],\n", + " [-1., 1., -1.]],\n", "\n", - " [[ 1., -1., 1.],\n", - " [-1., 1., 1.],\n", - " [ 1., 1., 1.]]]], grad_fn=)" + " [[ 1., 1., -1.],\n", + " [-1., 1., -1.],\n", + " [-1., 1., 1.]]]], grad_fn=)" ] }, "execution_count": 15, @@ -759,10 +747,11 @@ "source": [ "## A Custom Quantizer initialized with Weight Statistics\n", "\n", - "So far we have seen use-cases where an `ExtendedInjector` provides at best a different kind of syntax to define a quantizer, without any particular other advantage. Let's now make things a bit more complicated to show the sort of situations where it really shines. \n", + "So far we have seen use-cases where an `ExtendedInjector` provides, at best, a different kind of syntax to define a quantizer, without any particular other advantage. Let's now make things a bit more complicated to show the sort of situations where it really shines.\n", "\n", "Let's say we want to define a binary weight quantizer where `scaling_impl` is still `ParameterScaling`. However, instead of being user-defined, we want `scaling_init` to be the maximum value found in the weight tensor of the quantized layer.\n", - "To support this sort of use cases where the quantizer depends on the layer, a quantized layer automatically passes itself to all its quantizers under the name of `module`. With some machinery then, we can achieve our goal:" + "To support this sort of use cases where the quantizer depends on the layer, a quantized layer automatically passes itself to all its quantizers under the name of `module`.\n", + "With only a few lines of code then, we can achieve our goal:" ] }, { @@ -784,7 +773,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note how we are leveraging the `@value` *decorator* to define a function that is executed at *dependency-injection (DI) time*. This kind of behaviour is similar to defining a @property instead of an attribute, with the difference that a @value function can depend on other attributes of the Injector, which are automatically passed in as arguments of the function during DI. \n", + "Note how we are leveraging the `@value` *decorator* to define a function that is executed at *dependency-injection (DI) time*. This kind of behaviour is similar in spirit to defining a `@property` instead of an *attribute*, with the difference that a `@value` function can depend on other attributes of the Injector, which are automatically passed in as arguments of the function during DI.\n", "\n", "Let's now pass the quantizer to a QuantConv2d and retrieve its quantized weights:" ] @@ -799,30 +788,30 @@ { "data": { "text/plain": [ - "QuantTensor(value=tensor([[[[-0.1848, -0.1848, -0.1848],\n", - " [ 0.1848, 0.1848, -0.1848],\n", - " [ 0.1848, 0.1848, 0.1848]],\n", + "QuantTensor(value=tensor([[[[-0.1912, -0.1912, 0.1912],\n", + " [ 0.1912, -0.1912, -0.1912],\n", + " [-0.1912, -0.1912, 0.1912]],\n", "\n", - " [[-0.1848, 0.1848, 0.1848],\n", - " [ 0.1848, -0.1848, -0.1848],\n", - " [ 0.1848, -0.1848, 0.1848]],\n", + " [[-0.1912, -0.1912, -0.1912],\n", + " [-0.1912, 0.1912, 0.1912],\n", + " [ 0.1912, -0.1912, -0.1912]],\n", "\n", - " [[-0.1848, 0.1848, 0.1848],\n", - " [-0.1848, 0.1848, 0.1848],\n", - " [-0.1848, -0.1848, -0.1848]]],\n", + " [[ 0.1912, 0.1912, 0.1912],\n", + " [ 0.1912, 0.1912, -0.1912],\n", + " [-0.1912, -0.1912, -0.1912]]],\n", "\n", "\n", - " [[[ 0.1848, -0.1848, 0.1848],\n", - " [-0.1848, -0.1848, 0.1848],\n", - " [ 0.1848, -0.1848, 0.1848]],\n", + " [[[-0.1912, 0.1912, -0.1912],\n", + " [ 0.1912, 0.1912, -0.1912],\n", + " [ 0.1912, 0.1912, -0.1912]],\n", "\n", - " [[ 0.1848, -0.1848, 0.1848],\n", - " [ 0.1848, -0.1848, -0.1848],\n", - " [-0.1848, -0.1848, 0.1848]],\n", + " [[-0.1912, 0.1912, -0.1912],\n", + " [-0.1912, 0.1912, -0.1912],\n", + " [ 0.1912, -0.1912, -0.1912]],\n", "\n", - " [[-0.1848, -0.1848, -0.1848],\n", - " [ 0.1848, 0.1848, -0.1848],\n", - " [ 0.1848, -0.1848, -0.1848]]]], grad_fn=), scale=tensor(0.1848, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" + " [[ 0.1912, -0.1912, -0.1912],\n", + " [ 0.1912, 0.1912, 0.1912],\n", + " [ 0.1912, 0.1912, 0.1912]]]], grad_fn=), scale=tensor(0.1912, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" ] }, "execution_count": 19, @@ -877,7 +866,7 @@ { "data": { "text/plain": [ - "tensor(0.1913, grad_fn=)" + "tensor(0.1880, grad_fn=)" ] }, "execution_count": 21, @@ -913,11 +902,11 @@ "evalue": "Error(s) in loading state_dict for QuantConv2d:\n\tMissing key(s) in state_dict: \"weight_quant.tensor_quant.scaling_impl.value\". ", "output_type": "error", "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)", - "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mparam_from_max_quant_conv\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload_state_dict\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfloat_conv\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mstate_dict\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[1;32mC:\\ProgramData\\Miniconda3\\envs\\pytorch_latest\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m in \u001b[0;36mload_state_dict\u001b[1;34m(self, state_dict, strict)\u001b[0m\n\u001b[0;32m 1049\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1050\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0merror_msgs\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1051\u001b[1;33m raise RuntimeError('Error(s) in loading state_dict for {}:\\n\\t{}'.format(\n\u001b[0m\u001b[0;32m 1052\u001b[0m self.__class__.__name__, \"\\n\\t\".join(error_msgs)))\n\u001b[0;32m 1053\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0m_IncompatibleKeys\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmissing_keys\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0munexpected_keys\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;31mRuntimeError\u001b[0m: Error(s) in loading state_dict for QuantConv2d:\n\tMissing key(s) in state_dict: \"weight_quant.tensor_quant.scaling_impl.value\". " + "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m", + "\u001B[1;31mRuntimeError\u001B[0m Traceback (most recent call last)", + "\u001B[1;32m\u001B[0m in \u001B[0;36m\u001B[1;34m\u001B[0m\n\u001B[1;32m----> 1\u001B[1;33m \u001B[0mparam_from_max_quant_conv\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mload_state_dict\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mfloat_conv\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mstate_dict\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0m", + "\u001B[1;32mC:\\ProgramData\\Miniconda3\\envs\\pytorch_latest\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001B[0m in \u001B[0;36mload_state_dict\u001B[1;34m(self, state_dict, strict)\u001B[0m\n\u001B[0;32m 1049\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m 1050\u001B[0m \u001B[1;32mif\u001B[0m \u001B[0mlen\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0merror_msgs\u001B[0m\u001B[1;33m)\u001B[0m \u001B[1;33m>\u001B[0m \u001B[1;36m0\u001B[0m\u001B[1;33m:\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[1;32m-> 1051\u001B[1;33m raise RuntimeError('Error(s) in loading state_dict for {}:\\n\\t{}'.format(\n\u001B[0m\u001B[0;32m 1052\u001B[0m self.__class__.__name__, \"\\n\\t\".join(error_msgs)))\n\u001B[0;32m 1053\u001B[0m \u001B[1;32mreturn\u001B[0m \u001B[0m_IncompatibleKeys\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mmissing_keys\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0munexpected_keys\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n", + "\u001B[1;31mRuntimeError\u001B[0m: Error(s) in loading state_dict for QuantConv2d:\n\tMissing key(s) in state_dict: \"weight_quant.tensor_quant.scaling_impl.value\". " ] } ], @@ -929,8 +918,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Ouch, we get an error. This is because `ParameterScaling` contains a learned parameter, and Pytorch expects all learned parameters of a model to be contained in a state dict. \n", - "We can work around the issue by either setting an appropriate config flag in Brevitas, or by passing `strict=False` to load_state_dict. We go with the first way as setting `strict=False` is too forgiving to other kind of problems:" + "Ouch, we get an error. This is because `ParameterScaling` contains a learned `torch.nn.Parameter`, and Pytorch expects all learned parameters of a model to be contained in a `state_dict` that is being loaded.\n", + "We can work around the issue by either setting the `IGNORE_MISSING_KEYS` config flag in Brevitas, or by passing `strict=False` to load_state_dict. We go with the former as setting `strict=False` is too forgiving to other kind of problems:" ] }, { @@ -962,7 +951,7 @@ "source": [ "Note that we could have also achieve the same goal by setting the *env variable* `BREVITAS_IGNORE_MISSING_KEYS=1`.\n", "\n", - "And now we can look at the quantized weights again:" + "And now if we take a look at the quantized weights again:" ] }, { @@ -973,30 +962,30 @@ { "data": { "text/plain": [ - "QuantTensor(value=tensor([[[[ 0.1913, 0.1913, -0.1913],\n", - " [-0.1913, -0.1913, 0.1913],\n", - " [-0.1913, -0.1913, 0.1913]],\n", + "QuantTensor(value=tensor([[[[-0.1880, 0.1880, 0.1880],\n", + " [-0.1880, -0.1880, -0.1880],\n", + " [ 0.1880, 0.1880, -0.1880]],\n", "\n", - " [[-0.1913, -0.1913, -0.1913],\n", - " [ 0.1913, 0.1913, 0.1913],\n", - " [-0.1913, 0.1913, 0.1913]],\n", + " [[-0.1880, -0.1880, 0.1880],\n", + " [ 0.1880, -0.1880, 0.1880],\n", + " [-0.1880, 0.1880, -0.1880]],\n", "\n", - " [[ 0.1913, 0.1913, 0.1913],\n", - " [-0.1913, 0.1913, -0.1913],\n", - " [-0.1913, -0.1913, 0.1913]]],\n", + " [[ 0.1880, 0.1880, 0.1880],\n", + " [ 0.1880, -0.1880, 0.1880],\n", + " [-0.1880, 0.1880, -0.1880]]],\n", "\n", "\n", - " [[[ 0.1913, 0.1913, 0.1913],\n", - " [ 0.1913, 0.1913, 0.1913],\n", - " [-0.1913, -0.1913, -0.1913]],\n", + " [[[-0.1880, -0.1880, 0.1880],\n", + " [-0.1880, -0.1880, -0.1880],\n", + " [ 0.1880, -0.1880, 0.1880]],\n", "\n", - " [[-0.1913, -0.1913, 0.1913],\n", - " [-0.1913, 0.1913, -0.1913],\n", - " [-0.1913, 0.1913, -0.1913]],\n", + " [[ 0.1880, 0.1880, 0.1880],\n", + " [ 0.1880, 0.1880, 0.1880],\n", + " [ 0.1880, -0.1880, 0.1880]],\n", "\n", - " [[-0.1913, -0.1913, -0.1913],\n", - " [ 0.1913, -0.1913, -0.1913],\n", - " [-0.1913, 0.1913, 0.1913]]]], grad_fn=), scale=tensor(0.1913, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" + " [[-0.1880, -0.1880, -0.1880],\n", + " [ 0.1880, -0.1880, 0.1880],\n", + " [ 0.1880, 0.1880, -0.1880]]]], grad_fn=), scale=tensor(0.1880, grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" ] }, "execution_count": 24, @@ -1012,7 +1001,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As expected, the scale factor has been updated to the new `weight.abs().max()`. \n", + "We see that, as expected, the scale factor has been updated to the new `weight.abs().max()`.\n", "\n", "What happens internally is that after `load_state_dict` is called on the layer, `ParamFromMaxWeightQuantizer.tensor_quant` gets called again to re-initialize `BinaryQuant`, and in turn `ParameterScaling` is re-initialized with a new `scaling_init` value computed based on the updated `module.weight` tensor. This whole process wouldn't have been possible without an `ExtendedInjector` behind it. " ] @@ -1021,17 +1010,261 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Building a custom quantization API\n", + "## Sharing a Quantizer\n", "\n", - "Now let's make things even more complicated. We are going to look at a scenario that illustrates the differences between a standard `Injector` (implemented in the dependencies library) and our `ExtendedInjector` extension. \n", + "There are two ways to share a quantizer between multiple layers, with importance differences.\n", + "\n", + "The first one, which we have seen so far, is to simply pass the same ExtendedInjector to multiple layers. What that does is sharing the same quantization strategy among different layers. Each layer still gets its own instance of the quantization implementation." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quant_conv1 = QuantConv2d(3, 2, (3, 3), weight_quant=MySignedBinaryQuantizer)\n", + "quant_conv2 = QuantConv2d(3, 2, (3, 3), weight_quant=MySignedBinaryQuantizer)\n", + "\n", + "quant_conv1.weight_quant is quant_conv2.weight_quant" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sharing an instance of Weight Quantization\n", + "\n", + "The second one, which we are introducing now, allows to share the same quantization instance among multiple layers. This can be useful in those scenarios where, for example, we want different layers to share the same scale factor. The syntax goes as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quant_conv1 = QuantConv2d(3, 2, (3, 3), weight_quant=MySignedBinaryQuantizer)\n", + "quant_conv2 = QuantConv2d(3, 2, (3, 3), weight_quant=quant_conv1.weight_quant)\n", + "\n", + "quant_conv1.weight_quant is quant_conv2.weight_quant" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(quant_conv1.quant_weight_scale() == quant_conv2.quant_weight_scale()).item()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens in background is that the weight quantizer now has access to both `quant_conv1` and `quant_conv2`. \n", + "So let's say we want to build a quantizer similar to `ParamFromMaxWeightQuantizer`, but in this case we want the scale factor to be initialized with the average of both weight tensors. When a quantizer has access to multiple parent modules, they are passed in at dependency injection time as a *tuple* under the same name `module` as before. So we can do the following:" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "class SharedParamFromMeanWeightQuantizer(MySignedBinaryQuantizer):\n", + " \n", + " @value\n", + " def scaling_init(module):\n", + " if isinstance(module, tuple):\n", + " return torch.cat((module[0].weight.view(-1), module[1].weight.view(-1))).abs().mean()\n", + " else:\n", + " return module.weight.abs().mean()\n", + " \n", + "quant_conv1 = QuantConv2d(3, 2, (3, 3), weight_quant=SharedParamFromMeanWeightQuantizer)\n", + "old_quant_conv1_scale = quant_conv1.quant_weight_scale()\n", + "quant_conv2 = QuantConv2d(3, 2, (3, 3), weight_quant=quant_conv1.weight_quant)\n", + "new_quant_conv1_scale = quant_conv1.quant_weight_scale()\n", + "\n", + "(old_quant_conv1_scale == new_quant_conv1_scale).item()" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(new_quant_conv1_scale == quant_conv2.quant_weight_scale()).item()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note how, when `quant_conv2` is initialized using the `weight_quant` of `quant_conv1`, weight quantization is re-initialized for both layers such that they end up having the same scale.\n", + "\n", + "We can see in this example how Brevitas works consistently with Pytorch's eager execution model. When we initialize `quant_conv1` we still don't know that its weight quantizer is going to be shared with `quant_conv2`, and the semantics of Pytorch impose that `quant_conv1` should work correctly both before and after `quant_conv2` is declared. The way we take advantage of dependency injection allows to do so." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sharing an instance of Activation Quantization\n", + "\n", + "Sharing an instance of activation quantization is easier because for most scenarios it's enough to simply share the whole layer itself, e.g. calling the same `QuantReLU` from multiple places in the forward pass.\n", + "\n", + "For those scenarios where sharing the whole layer is not possible, there is something important to keep in mind. Instances of activation quantization include (for performance reasons) the implementation of the non-linear activation itself (if any). So, for example, using a `QuantReLU.act_quant` to initialize a `QuantConv2d.output_quant` should be avoided as we would not share not only the quantizer, but also the relu activation function. \n", + "In general then sharing of instances of activations quantization should be done only between activations of the same *kind*. \n", + "\n", + "*Note*: we say kind and not type because `input_quant`, `output_quant` and `IdentityQuant` count as being the same kind of activation, even though they belong to different type of layers." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dealing with Weight Initialization\n", + "\n", + "There is a type of situation that Brevitas cannot deal with automatically. That is, when the initialization of the quantizer depends on the layer to which it is applied (like with the `ParamFromMaxWeightQuantizer` or `SharedParamFromMeanWeightQuantizer` quantizers), but the layer gets modified after it is initialized. \n", + "\n", + "The typical example is with weight initialization when training from scratch (so rather than loading from a floating-point state_dict):" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 68, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quant_conv_w_init = QuantConv2d(3, 2, (3, 3), weight_quant=ParamFromMaxWeightQuantizer)\n", + "torch.nn.init.uniform_(quant_conv_w_init.weight)\n", + "\n", + "(quant_conv_w_init.weight.abs().max() == quant_conv_w_init.quant_weight_scale()).item()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see how the scale factor is not initialized correctly anymore. In this case we can simply trigger re-initialization of the weight quantizer manually:" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quant_conv_w_init.weight_quant.init_tensor_quant()\n", + "\n", + "(quant_conv_w_init.weight.abs().max() == quant_conv_w_init.quant_weight_scale()).item()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Note*: because the way weights are initialized is often the same as to how an optimizer performs the weight update step, there are currently no plans to try to perform re-initialization automatically (as it happens e.g. when a `state_dict` is loaded) since it wouldn't be possible to distinguish between the two scenarios." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building a Custom Quantization API\n", + "\n", + "Finally, let's go through an even more complicated example. We are going to look at a scenario that illustrates the differences between a standard `Injector` (implemented in the dependencies library) and our `ExtendedInjector` extension.\n", + "\n", + "Let's say we want to build two quantizers for respectively weights and activations and build a simple API on top of them.\n", + "In particular, we want to be able to switch between `BinaryQuant` and `ClampedBinaryQuant` (a variant of binary quantization with clamping), and we want to optionally perform *per-channel scaling*.\n", + "To do so, we are going to implement the controlling logic through a hierarchy of ExtendedInjector, leaving two boolean flags exposed as arguments of the quantizers, with the idea then that the flags can be set through keyword arguments of the respective quantized layers.\n", "\n", - "Let's say we want to build two quantizers for respectively weights and activations and build a simple API on top of them. In particular, we want to be able to switch between `BinaryQuant` and `ClampedBinaryQuant` (a variant of binary quantization with clamping typically used in activation quantization), and we want to optionally perform *per-channel scaling*. \n", "We can go as follows:" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1087,7 +1320,7 @@ "source": [ "There are a bunch of things going on here to unpack. \n", "\n", - "The first one is that a `@value` function can return a class to auto-wire and inject, as seen in the definition of `tensor_quant`. This wouldn't normally be possible with a standard `Injector`, but it's possible with an `ExtendedInjector`. This way we can switch between different implementations of `tensor_quant` with a boolean flag. \n", + "The first one is that a `@value` function can return a class to auto-wire and inject, as seen in the definition of `tensor_quant`. This wouldn't normally be possible with a standard `Injector`, but it's possible with an `ExtendedInjector`. This way we can switch between different implementations of `tensor_quant`.\n", "\n", "The second one is the special object `this`. `this` is already present in the *dependencies* library, and it's used as a way to retrieve attributes of the quantizer from within the quantizer itself. However, normally it wouldn't be possible to return a reference to `this` from a `@value` function. Again this is something that only a `ExtendedInjector` supports, and it allows to chain different attributes in a way such that the chained values are computed only when necessary. \n", "\n", @@ -1096,46 +1329,9 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "QuantTensor(value=tensor([[[[ 0.1718, -0.1718, -0.1718],\n", - " [ 0.1718, -0.1718, -0.1718],\n", - " [-0.1718, 0.1718, -0.1718]],\n", - "\n", - " [[-0.1718, 0.1718, -0.1718],\n", - " [-0.1718, -0.1718, -0.1718],\n", - " [ 0.1718, 0.1718, -0.1718]],\n", - "\n", - " [[-0.1718, 0.1718, 0.1718],\n", - " [-0.1718, -0.1718, -0.1718],\n", - " [ 0.1718, -0.1718, -0.1718]]],\n", - "\n", - "\n", - " [[[-0.1897, -0.1897, 0.1897],\n", - " [ 0.1897, 0.1897, 0.1897],\n", - " [-0.1897, -0.1897, 0.1897]],\n", - "\n", - " [[-0.1897, -0.1897, 0.1897],\n", - " [ 0.1897, 0.1897, -0.1897],\n", - " [-0.1897, -0.1897, -0.1897]],\n", - "\n", - " [[-0.1897, -0.1897, -0.1897],\n", - " [ 0.1897, -0.1897, 0.1897],\n", - " [-0.1897, 0.1897, 0.1897]]]], grad_fn=), scale=tensor([[[[0.1718]]],\n", - "\n", - "\n", - " [[[0.1897]]]], grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "per_channel_quant_conv = QuantConv2d(\n", " 3, 2, (3, 3), \n", @@ -1154,46 +1350,9 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "QuantTensor(value=tensor([[[[ 0.1913, 0.1913, -0.1913],\n", - " [-0.1913, -0.1913, 0.1913],\n", - " [-0.1913, -0.1913, 0.1913]],\n", - "\n", - " [[-0.1913, -0.1913, -0.1913],\n", - " [ 0.1913, 0.1913, 0.1913],\n", - " [-0.1913, 0.1913, 0.1913]],\n", - "\n", - " [[ 0.1913, 0.1913, 0.1913],\n", - " [-0.1913, 0.1913, -0.1913],\n", - " [-0.1913, -0.1913, 0.1913]]],\n", - "\n", - "\n", - " [[[ 0.1889, 0.1889, 0.1889],\n", - " [ 0.1889, 0.1889, 0.1889],\n", - " [-0.1889, -0.1889, -0.1889]],\n", - "\n", - " [[-0.1889, -0.1889, 0.1889],\n", - " [-0.1889, 0.1889, -0.1889],\n", - " [-0.1889, 0.1889, -0.1889]],\n", - "\n", - " [[-0.1889, -0.1889, -0.1889],\n", - " [ 0.1889, -0.1889, -0.1889],\n", - " [-0.1889, 0.1889, 0.1889]]]], grad_fn=), scale=tensor([[[[0.1913]]],\n", - "\n", - "\n", - " [[[0.1889]]]], grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "per_channel_quant_conv.load_state_dict(float_conv.state_dict())\n", "per_channel_quant_conv.quant_weight()" @@ -1210,23 +1369,9 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "tensor([[-0.0100, -0.0100, 0.0100, 0.0100],\n", - " [-0.0100, 0.0100, -0.0100, 0.0100],\n", - " [ 0.0100, 0.0100, 0.0100, 0.0100],\n", - " [-0.0100, 0.0100, -0.0100, 0.0100]], grad_fn=)" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "from brevitas.nn import QuantIdentity\n", "\n", @@ -1239,38 +1384,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note how `AdvancedActQuantizer` doesn't define a `per_channel_broadcastable_shape`, yet no errors are triggered. This is because `this.per_channel_broadcastable_shape` is required only when `scaling_per_output_channel` is `True`, while in this case `scaling_per_output_channel` is `False`. Let' try that then:" + "Note how `AdvancedActQuantizer` doesn't define a `per_channel_broadcastable_shape`, yet no errors are triggered. This is because `this.per_channel_broadcastable_shape` is required only when `scaling_per_output_channel` is `True`, while in this case `scaling_per_output_channel` is `False`.\n", + "Let' try to set it to `True` then:" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": { "tags": [ "raises-exception" ] }, - "outputs": [ - { - "ename": "DependencyError", - "evalue": "'AdvancedActQuantizer' can not resolve attribute 'per_channel_broadcastable_shape'", - "output_type": "error", - "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mDependencyError\u001b[0m Traceback (most recent call last)", - "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mbrevitas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnn\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mQuantIdentity\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m quant_identity = QuantIdentity(\n\u001b[0m\u001b[0;32m 4\u001b[0m act_quant=AdvancedActQuantizer, is_clamped=True, scaling_per_output_channel=True)\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\nn\\quant_activation.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, act_quant, return_quant_tensor, **kwargs)\u001b[0m\n\u001b[0;32m 128\u001b[0m \u001b[0mreturn_quant_tensor\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mbool\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 129\u001b[0m **kwargs):\n\u001b[1;32m--> 130\u001b[1;33m QuantNLAL.__init__(\n\u001b[0m\u001b[0;32m 131\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 132\u001b[0m \u001b[0minput_quant\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\nn\\quant_layer.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, act_impl, passthrough_act, input_quant, act_quant, return_quant_tensor, **kwargs)\u001b[0m\n\u001b[0;32m 75\u001b[0m \u001b[0mQuantLayerMixin\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mreturn_quant_tensor\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 76\u001b[0m \u001b[0mQuantInputMixin\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minput_quant\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 77\u001b[1;33m QuantNonLinearActMixin.__init__(\n\u001b[0m\u001b[0;32m 78\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 79\u001b[0m \u001b[0mact_impl\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\nn\\mixin\\act.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, act_impl, passthrough_act, act_quant, **kwargs)\u001b[0m\n\u001b[0;32m 182\u001b[0m \u001b[0mact_quant\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mUnion\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mActQuantProxyProtocol\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mType\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mInjector\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 183\u001b[0m **kwargs):\n\u001b[1;32m--> 184\u001b[1;33m QuantActMixin.__init__(\n\u001b[0m\u001b[0;32m 185\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 186\u001b[0m \u001b[0mact_impl\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mact_impl\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\nn\\mixin\\act.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, act_impl, passthrough_act, act_quant, proxy_from_injector_impl, proxy_prefix, kwargs_prefix, **kwargs)\u001b[0m\n\u001b[0;32m 81\u001b[0m \u001b[0mact_quant_injector\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mact_quant_injector\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlet\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mact_impl\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mact_impl\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 82\u001b[0m \u001b[0mact_quant_injector\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mupdate_aqi\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mact_quant_injector\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 83\u001b[1;33m \u001b[0mact_quant\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mproxy_from_injector_impl\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mact_quant_injector\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 84\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 85\u001b[0m \u001b[1;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mact_quant\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mActQuantProxyProtocol\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\proxy\\runtime_quant.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, act_quant_injector)\u001b[0m\n\u001b[0;32m 85\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mact_quant_injector\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mInjector\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 86\u001b[0m \u001b[0msuper\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mActQuantProxyFromInjector\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mact_quant_injector\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 87\u001b[1;33m \u001b[0mtensor_quant\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mact_quant_injector\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtensor_quant\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 88\u001b[0m \u001b[0mact_impl\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mact_quant_injector\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mact_impl\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 89\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mis_quant_enabled\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtensor_quant\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - " \u001b[1;31m[... skipping hidden 1 frame]\u001b[0m\n", - "\u001b[1;32mC:\\ProgramData\\Miniconda3\\envs\\pytorch_latest\\lib\\site-packages\\_dependencies\\this.py\u001b[0m in \u001b[0;36m__call__\u001b[1;34m(self, __self__)\u001b[0m\n\u001b[0;32m 49\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mkind\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;34m\".\"\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 50\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 51\u001b[1;33m \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0msymbol\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 52\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mDependencyError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 53\u001b[0m message = (\n", - "\u001b[1;32mc:\\brevitas\\src\\brevitas\\inject\\__init__.py\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(cls, attrname)\u001b[0m\n\u001b[0;32m 115\u001b[0m \u001b[0mcls\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcurrent_attr\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 116\u001b[0m )\n\u001b[1;32m--> 117\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mDependencyError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 118\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 119\u001b[0m \u001b[0mmarker\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mattribute\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhave_defaults\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mspec\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", - "\u001b[1;31mDependencyError\u001b[0m: 'AdvancedActQuantizer' can not resolve attribute 'per_channel_broadcastable_shape'" - ] - } - ], + "outputs": [], "source": [ "from brevitas.nn import QuantIdentity\n", "\n", @@ -1287,26 +1413,9 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "QuantTensor(value=tensor([[ 0.0100, -0.0100, 0.0100, 0.0100],\n", - " [-0.0100, -0.0100, 0.0100, 0.0100],\n", - " [-0.0100, -0.0100, 0.0100, 0.0100],\n", - " [ 0.0100, -0.0100, 0.0100, 0.0100]], grad_fn=), scale=tensor([[0.0100],\n", - " [0.0100],\n", - " [0.0100],\n", - " [0.0100]], grad_fn=), zero_point=tensor(0.), bit_width=tensor(1.), signed=True, training=True)" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "quant_identity = QuantIdentity(\n", " act_quant=AdvancedActQuantizer, is_clamped=True, scaling_per_output_channel=True,\n", @@ -1318,9 +1427,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We have seen how powerful dependency injection is. In a way, it's even too expressive. For users that are not interesting in building completely custom quantizers it's a bit too much. In particular, it can be hard to make sense of how the various components available under `brevitas.core` can be assembled together according to be practices. \n", + "We have seen how powerful dependency injection is. In a way, it's even too expressive. For users that are not interesting in building completely custom quantizers, it can be hard to make sense of how the various components available under `brevitas.core` can be assembled together according to best practices.\n", "\n", - "In the next tutorial then we are going to take a look at an API implemented with the mechanisms we just saw that builds on top of the components available in `brevitas.core` and exposes easily switchable settings that follows best practices." + "In the next tutorial then we are going to take a look at an API implemented with the mechanisms we just saw that builds on top of the components available in `brevitas.core` and exposes easily switchable settings." ] } ], @@ -1345,4 +1454,4 @@ }, "nbformat": 4, "nbformat_minor": 1 -} +} \ No newline at end of file