# oneflow.nn.FakeQuantization¶

class oneflow.nn.FakeQuantization(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')

Simulate the quantize and dequantize operations in training time.

The output will be computed as:

if quantization_scheme == “symmetric”:

\begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit - 1} - 1\\& quant\_min = -quant\_max\\& clamp(round(x / scale), quant\_min, quant\_max) * scale\end{aligned}\end{align}

elif quantization_scheme == “affine”:

\begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit} - 1\\& quant\_min = 0\\& (clamp(round(x / scale + zero\_point), quant\_min, quant\_max) - zero\_point) * scale\end{aligned}\end{align}
Parameters
• input (oneflow.Tensor) – the input value(s), in oneflow.float32.

• scale (oneflow.Tensor) – quantization scale.

• zero_point (oneflow.Tensor) – quantization zero_point.

• quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.

• quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.

• quantization_formula (str) – Support “google” or “cambricon”.

Returns

Input tensor after quantize and dequantize operations.

Return type

oneflow.Tensor

For example:

>>> import numpy as np
>>> import oneflow as flow

>>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32)

>>> input_tensor = flow.tensor(
...    weight, dtype=flow.float32
... )

>>> quantization_bit = 8
>>> quantization_scheme = "symmetric"
>>> per_layer_quantization = True

>>> min_max_observer = flow.nn.MinMaxObserver(quantization_formula=quantization_formula, quantization_bit=quantization_bit,
... quantization_scheme=quantization_scheme, per_layer_quantization=per_layer_quantization)
>>> fake_quantization = flow.nn.FakeQuantization(quantization_formula=quantization_formula, quantization_bit=quantization_bit,
... quantization_scheme=quantization_scheme)

>>> scale, zero_point = min_max_observer(
...    input_tensor,
... )

>>> output_tensor = fake_quantization(
...    input_tensor,
...    scale,
...    zero_point,
... )

__init__(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')None

Calls super().__setattr__(‘a’, a) instead of the typical self.a = a to avoid Module.__setattr__ overhead. Module’s __setattr__ has special handling for parameters, submodules, and buffers but simply calls into super().__setattr__ for all other attributes.

Methods

 __call__(*args, **kwargs) Call self as a function. __delattr__(name) Implement delattr(self, name). __dir__() Default dir() implementation. __eq__(value, /) Return self==value. __format__(format_spec, /) Default object formatter. __ge__(value, /) Return self>=value. __getattr__(name) __getattribute__(name, /) Return getattr(self, name). __getstate__() __gt__(value, /) Return self>value. __hash__() Return hash(self). __init__([quantization_formula, …]) Calls super().__setattr__(‘a’, a) instead of the typical self.a = a to avoid Module.__setattr__ overhead. __init_subclass__ This method is called when a class is subclassed. __le__(value, /) Return self<=value. __lt__(value, /) Return self

Attributes

 _grad_t alias of Union[Tuple[oneflow.Tensor, …], oneflow.Tensor]