oneflow.nn.RMSNorm¶
-
class
oneflow.nn.
RMSNorm
(normalized_shape: Union[int, Tuple[int], oneflow.Size], eps: float = 1e-05, elementwise_affine: bool = True, device=None, dtype=None)¶ Applies Root Mean Square Layer Normalization over a mini-batch of inputs as described in the paper Root Mean Square Layer Normalization
\[y = \frac{x}{\mathrm{RMS}[x]} \mathrm{weight},\text{ where }\mathrm{RMS}[x] = \sqrt{\frac{1}{n} \sum_{i=1}^{n} x^{2}}\]There is no bias and no subtraction of mean with RMS Layer Normalization, and it only scales and doesn’t shift.
The root mean squre are calculated separately over the last certain number dimensions which have to be of the shape specified by
normalized_shape
. \(\weight\) is learnable affine transform parameters ofnormalized_shape
ifelementwise_affine
isTrue
.Note
Like Layer Normalization, Root Mean Square Layer Normalization applies per-element scale with
elementwise_affine
.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
normalized_shape (int or list or oneflow.Size) –
input shape from an expected input of size
\[[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\]If a single integer is used, it is treated as a singleton list, and this module will
normalize over the last dimension which is expected to be of that specific size.
eps – a value added to the denominator for numerical stability. Default: 1e-5
elementwise_affine – a boolean value that when set to
True
, this module has learnable per-element affine parameters initialized to ones (for weights). Default:True
.
- Shape:
Input: \((N, *)\)
Output: \((N, *)\) (same shape as input)
For example:
>>> import numpy as np >>> import oneflow as flow >>> input_arr = np.array( ... [ ... [ ... [[-0.16046895, -1.03667831], [-0.34974465, 0.26505867]], ... [[-1.24111986, -0.53806001], [1.72426331, 0.43572459]], ... ], ... [ ... [[-0.77390957, -0.42610624], [0.16398858, -1.35760343]], ... [[1.07541728, 0.11008703], [0.26361224, -0.48663723]], ... ], ... ], ... dtype=np.float32, ... ) >>> x = flow.Tensor(input_arr, device="cuda") >>> m = flow.nn.RMSNorm(2).to(device="cuda") >>> y = m(x).numpy() >>> y array([[[[-0.21632987, -1.3975569 ], [-1.127044 , 0.8541454 ]], [[-1.2975204 , -0.5625112 ], [ 1.3711083 , 0.34648165]]], [[[-1.2388322 , -0.6820876 ], [ 0.16959298, -1.4040003 ]], [[ 1.4068495 , 0.14401469], [ 0.6735778 , -1.2434478 ]]]], dtype=float32)