oneflow.nn.RMSNorm

class oneflow.nn.RMSNorm(normalized_shape: Union[int, Tuple[int], oneflow.Size], eps: float = 1e-05, elementwise_affine: bool = True, device=None, dtype=None)

Applies Root Mean Square Layer Normalization over a mini-batch of inputs as described in the paper Root Mean Square Layer Normalization

\[y = \frac{x}{\mathrm{RMS}[x]} \mathrm{weight},\text{ where }\mathrm{RMS}[x] = \sqrt{\frac{1}{n} \sum_{i=1}^{n} x^{2}}\]

There is no bias and no subtraction of mean with RMS Layer Normalization, and it only scales and doesn’t shift.

The root mean squre are calculated separately over the last certain number dimensions which have to be of the shape specified by normalized_shape. \(\weight\) is learnable affine transform parameters of normalized_shape if elementwise_affine is True.

Note

Like Layer Normalization, Root Mean Square Layer Normalization applies per-element scale with elementwise_affine.

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters
  • normalized_shape (int or list or oneflow.Size) –

    input shape from an expected input of size

    \[[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\]

    If a single integer is used, it is treated as a singleton list, and this module will

    normalize over the last dimension which is expected to be of that specific size.

  • eps – a value added to the denominator for numerical stability. Default: 1e-5

  • elementwise_affine – a boolean value that when set to True, this module has learnable per-element affine parameters initialized to ones (for weights). Default: True.

Shape:
  • Input: \((N, *)\)

  • Output: \((N, *)\) (same shape as input)

For example:

>>> import numpy as np
>>> import oneflow as flow

>>> input_arr = np.array(
...     [
...         [
...             [[-0.16046895, -1.03667831], [-0.34974465, 0.26505867]],
...             [[-1.24111986, -0.53806001], [1.72426331, 0.43572459]],
...         ],
...         [
...             [[-0.77390957, -0.42610624], [0.16398858, -1.35760343]],
...             [[1.07541728, 0.11008703], [0.26361224, -0.48663723]],
...         ],
...     ],
...     dtype=np.float32,
... )

>>> x = flow.Tensor(input_arr, device="cuda")
>>> m = flow.nn.RMSNorm(2).to(device="cuda")
>>> y = m(x).numpy()
>>> y
array([[[[-0.21632987, -1.3975569 ],
         [-1.127044  ,  0.8541454 ]],

        [[-1.2975204 , -0.5625112 ],
         [ 1.3711083 ,  0.34648165]]],


       [[[-1.2388322 , -0.6820876 ],
         [ 0.16959298, -1.4040003 ]],

        [[ 1.4068495 ,  0.14401469],
         [ 0.6735778 , -1.2434478 ]]]], dtype=float32)