oneflow.nn.FakeQuantization

class oneflow.nn.FakeQuantization(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')

Simulate the quantize and dequantize operations in training time.

The output will be computed as:

if quantization_scheme == “symmetric”:

\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit - 1} - 1\\& quant\_min = -quant\_max\\& clamp(round(x / scale), quant\_min, quant\_max) * scale\end{aligned}\end{align} \]

elif quantization_scheme == “affine”:

\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit} - 1\\& quant\_min = 0\\& (clamp(round(x / scale + zero\_point), quant\_min, quant\_max) - zero\_point) * scale\end{aligned}\end{align} \]
Parameters
  • input (oneflow.Tensor) – the input value(s), in oneflow.float32.

  • scale (oneflow.Tensor) – quantization scale.

  • zero_point (oneflow.Tensor) – quantization zero_point.

  • quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.

  • quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.

  • quantization_formula (str) – Support “google” or “cambricon”.

Returns

Input tensor after quantize and dequantize operations.

Return type

oneflow.Tensor

For example:

>>> import numpy as np
>>> import oneflow as flow

>>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32)

>>> input_tensor = flow.tensor(
...    weight, dtype=flow.float32
... )

>>> quantization_bit = 8
>>> quantization_scheme = "symmetric"
>>> quantization_formula = "google"
>>> per_layer_quantization = True

>>> min_max_observer = flow.nn.MinMaxObserver(quantization_formula=quantization_formula, quantization_bit=quantization_bit,
... quantization_scheme=quantization_scheme, per_layer_quantization=per_layer_quantization)
>>> fake_quantization = flow.nn.FakeQuantization(quantization_formula=quantization_formula, quantization_bit=quantization_bit,
... quantization_scheme=quantization_scheme)

>>> scale, zero_point = min_max_observer(
...    input_tensor,
... )

>>> output_tensor = fake_quantization(
...    input_tensor,
...    scale,
...    zero_point,
... )
__init__(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')None

Calls super().__setattr__(‘a’, a) instead of the typical self.a = a to avoid Module.__setattr__ overhead. Module’s __setattr__ has special handling for parameters, submodules, and buffers but simply calls into super().__setattr__ for all other attributes.

Methods

__call__(*args, **kwargs)

Call self as a function.

__delattr__(name)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattr__(name)

__getattribute__(name, /)

Return getattr(self, name).

__getstate__()

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__([quantization_formula, …])

Calls super().__setattr__(‘a’, a) instead of the typical self.a = a to avoid Module.__setattr__ overhead.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

Create and return a new object.

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

__setstate__(state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_apply(fn)

_get_backward_hooks()

Returns the backward hooks for use in the call function.

_get_name()

_load_from_state_dict(state_dict, prefix, …)

_maybe_warn_non_full_backward_hook(args, …)

_named_members(get_members_fn[, prefix, recurse])

_register_load_state_dict_pre_hook(hook[, …])

These hooks will be called with arguments: state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs, before loading state_dict into self.

_register_state_dict_hook(hook)

These hooks will be called with arguments: self, state_dict, prefix, local_metadata, after the state_dict of self is set.

_save_to_state_dict(destination, prefix, …)

_shallow_repr()

_to_memory_format(memory_format)

Casts the parameters and buffers in this module to another memory format.

add_module(name, module)

Adds a child module to the current module.

apply(fn)

Applies fn recursively to every submodule (as returned by .children()) as well as self.

buffers([recurse])

Returns an iterator over module buffers.

children()

Returns an iterator over immediate children modules.

cpu()

Moves all model parameters and buffers to the CPU.

cuda([device])

Moves all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Sets the module in evaluation mode.

extra_repr()

Set the extra representation of the module

float()

Casts all floating point parameters and buffers to float datatype.

forward(input, scale, zero_point)

get_parameter(target)

Return the parameter refenreced by target.

get_submodule(target)

Get submodule accroding to the name of submodule.

half()

Casts all floating point parameters and buffers to half datatype.

load_state_dict(state_dict[, strict])

Copies parameters and buffers from state_dict into this module and its descendants.

make_contiguous_params_group()

Get contiguous parameters group after creating the whole module.

modules()

Returns an iterator over all modules in the network.

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

register_backward_hook(hook)

Registers a backward hook on the module.

register_buffer(name, tensor[, persistent])

Adds a buffer to the module.

register_forward_hook(hook)

Registers a forward hook on the module.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the module.

register_full_backward_hook(hook)

Registers a backward hook on the module.

register_parameter(name, param)

Adds a parameter to the module.

register_state_dict_pre_hook(hook)

These hooks will be called with arguments: self, prefix, and keep_vars before calling state_dict on self.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

state_dict([destination, prefix, keep_vars])

Returns a dictionary containing a whole state of the module.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

to_consistent(*args, **kwargs)

This interface is no longer available, please use oneflow.nn.Module.to_global() instead.

to_empty(*, device)

Moves the parameters and buffers to the specified device without copying storage.

to_global([placement, sbp])

Convert the parameters and buffers to global.

to_local()

to_memory_format(memory_format)

train([mode])

Sets the module in training mode.

zero_grad([set_to_none])

Sets gradients of all model parameters to zero.

Attributes

_grad_t

alias of Union[Tuple[oneflow.Tensor, …], oneflow.Tensor]