oneflow.nn¶
Operators for neural networks¶
Copyright 2020 The OneFlow Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
oneflow.nn.
AdaptiveAvgPool1d
(output_size: Union[int, Tuple[int]])¶ Applies a 1D adaptive average pooling over an input signal composed of several input planes.
The output size is H, for any input size. The number of output features is equal to the number of input planes.
- Parameters
output_size – the target output size H
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> m = nn.AdaptiveAvgPool1d(5) >>> input = flow.Tensor(np.random.randn(1, 64, 8)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 5])
-
class
oneflow.nn.
AdaptiveAvgPool2d
(output_size)¶ Applies a 2D adaptive average pooling over an input signal composed of several input planes.
The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.
- Parameters
output_size – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a
int
, orNone
which means the size will be the same as that of the input.
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> m = nn.AdaptiveAvgPool2d((5,7)) >>> input = flow.Tensor(np.random.randn(1, 64, 8, 9)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 5, 7]) >>> m = nn.AdaptiveAvgPool2d(7) >>> input = flow.Tensor(np.random.randn(1, 64, 10, 9)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 7, 7]) >>> m = nn.AdaptiveAvgPool2d((None, 7)) >>> input = flow.Tensor(np.random.randn(1, 64, 10, 9)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 10, 7])
-
class
oneflow.nn.
AdaptiveAvgPool3d
(output_size)¶ Applies a 3D adaptive average pooling over an input signal composed of several input planes.
The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes.
- Parameters
output_size – the target output size of the form D x H x W. Can be a tuple (D, H, W) or a single number D for a cube D x D x D. D, H and W can be either a
int
, orNone
which means the size will be the same as that of the input.
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> m = nn.AdaptiveAvgPool3d((5,7,9)) >>> input = flow.Tensor(np.random.randn(1, 64, 8, 9, 10)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 5, 7, 9]) >>> m = nn.AdaptiveAvgPool3d(7) >>> input = flow.Tensor(np.random.randn(1, 64, 10, 9, 8)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 7, 7, 7]) >>> m = nn.AdaptiveAvgPool3d((7, None, None)) >>> input = flow.Tensor(np.random.randn(1, 64, 10, 9, 8)) >>> output = m(input) >>> output.size() oneflow.Size([1, 64, 7, 9, 8])
-
class
oneflow.nn.
AvgPool1d
(kernel_size: Union[int, Tuple[int, int]], stride: Optional[Union[int, Tuple[int, int]]] = None, padding: Union[int, Tuple[int, int]] = 0, ceil_mode: bool = False, count_include_pad: bool = True)¶ Applies a 1D average pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and kernel_size \(k\) can be precisely described as:
\[\begin{split}out(N_i, C_j, l) = \\frac{1}{k} \\sum_{m=0}^{k-1} input(N_i, C_j, stride[0] \\times h + m, stride*l + m)\end{split}\]If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points. The parameters kernel_size, stride, padding can each be an int or a one-element tuple.
Note
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
- Parameters
kernel_size – the size of the window.
strides – the stride of the window. Default value is kernel_size.
padding – implicit zero padding to be added on both sides.
ceil_mode – when True, will use ceil instead of floor to compute the output shape.
count_include_pad – when True, will include the zero-padding in the averaging calculation.
For example:
import oneflow as flow import numpy as np m = flow.nn.AvgPool1d(kernel_size=3, padding=1, stride=1) x = flow.tensor(np.random.randn(1, 4, 4)) y = m(x) y.shape oneflow.Size([1, 4, 4])
-
class
oneflow.nn.
AvgPool2d
(kernel_size: Union[int, Tuple[int, int]], stride: Optional[Union[int, Tuple[int, int]]] = None, padding: Union[int, Tuple[int, int]] = 0, ceil_mode: bool = False, count_include_pad: bool = True, divisor_override: int = 0)¶ Performs the 2d-average pooling on the input.
In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and kernel_size \((kH, kW)\) can be precisely described as:
\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]- Parameters
kernel_size (Union[int, Tuple[int, int]]) – An int or list of ints that has length 1, 2. The size of the window for each dimension of the input Tensor.
strides (Union[int, Tuple[int, int]]) – An int or list of ints that has length 1, 2. The stride of the sliding window for each dimension of the input Tensor.
padding (Tuple[int, int]) – An int or list of ints that has length 1, 2. Implicit zero padding to be added on both sides.
ceil_mode (bool, default to False) – When True, will use ceil instead of floor to compute the output shape.
For example:
import oneflow as flow import numpy as np m = flow.nn.AvgPool2d(kernel_size=3, padding=1, stride=1) x = flow.tensor(np.random.randn(1, 4, 4, 4)) y = m(x) y.shape oneflow.Size([1, 4, 4, 4])
-
class
oneflow.nn.
AvgPool3d
(kernel_size: Union[int, Tuple[int, int, int]], stride: Optional[Union[int, Tuple[int, int, int]]] = None, padding: Union[int, Tuple[int, int, int]] = 0, ceil_mode: bool = False, count_include_pad: bool = True, divisor_override: int = 0)¶ Applies a 3D average pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and kernel_size \((kD, kH, kW)\) can be precisely described as:
\[\begin{split}out(N_i, C_j, d, h, w) = \\frac{1}{kD * kH * kW } \\sum_{k=0}^{kD-1} \\sum_{m=0}^{kH-1} \\sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \\times d + k, stride[1] \\times h + m, stride[2] \\times w + n)\end{split}\]If padding is non-zero, then the input is implicitly zero-padded on all three sides for padding number of points.
Note
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
- Parameters
kernel_size – the size of the window.
strides – the stride of the window. Default value is kernel_size.
padding – implicit zero padding to be added on all three sides.
ceil_mode – when True, will use ceil instead of floor to compute the output shape.
count_include_pad – when True, will include the zero-padding in the averaging calculation.
divisor_override – if specified, it will be used as divisor, otherwise kernel_size will be used.
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, D_{out}, H_{out}, W_{out})\), where
\[\begin{split}D_{out} = \\left\\lfloor\\frac{D_{in} + 2 \\times \\text{padding}[0] - \\text{kernel_size}[0]}{\\text{stride}[0]} + 1\\right\\rfloor\end{split}\]\[\begin{split}H_{out} = \\left\\lfloor\\frac{H_{in} + 2 \\times \\text{padding}[1] - \\text{kernel_size}[1]}{\\text{stride}[1]} + 1\\right\\rfloor\end{split}\]\[\begin{split}W_{out} = \\left\\lfloor\\frac{W_{in} + 2 \\times \\text{padding}[2] - \\text{kernel_size}[2]}{\\text{stride}[2]} + 1\\right\\rfloor\end{split}\]
For example:
import oneflow as flow import numpy as np m = flow.nn.AvgPool3d(kernel_size=(2,2,2),padding=(0,0,0),stride=(1,1,1)) x = flow.tensor(np.random.randn(9, 7, 11, 32, 20)) y = m(x) y.shape oneflow.Size([9, 7, 10, 31, 19])
-
class
oneflow.nn.
BCELoss
(weight: Optional[oneflow.Tensor] = None, reduction: str = 'mean')¶ This operator computes the binary cross entropy loss.
The equation is:
if reduction = “none”:
\[out = -(Target_i*log(Input_i) + (1-Target_i)*log(1-Input_i))\]if reduction = “mean”:
\[out = -\frac{1}{n}\sum_{i=1}^n(Target_i*log(Input_i) + (1-Target_i)*log(1-Input_i))\]if reduction = “sum”:
\[out = -\sum_{i=1}^n(Target_i*log(Input_i) + (1-Target_i)*log(1-Input_i))\]- Parameters
weight (oneflow.Tensor, optional) – The manual rescaling weight to the loss. Default to None, whose corresponding weight value is 1.
reduction (str, optional) – The reduce type, it can be one of “none”, “mean”, “sum”. Defaults to “mean”.
Attention
The input value must be in the range of (0, 1). Or the loss function may return nan value.
- Returns
The result Tensor.
- Return type
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.Tensor(np.array([[1.2, 0.2, -0.3], [0.7, 0.6, -2]]).astype(np.float32)) >>> target = flow.Tensor(np.array([[0, 1, 0], [1, 0, 1]]).astype(np.float32)) >>> weight = flow.Tensor(np.array([[2, 2, 2], [2, 2, 2]]).astype(np.float32)) >>> activation = flow.nn.Sigmoid() >>> sigmoid_input = activation(input) >>> m = flow.nn.BCELoss(weight, reduction="none") >>> out = m(sigmoid_input, target) >>> out tensor([[2.9266, 1.1963, 1.1087], [0.8064, 2.0750, 4.2539]], dtype=oneflow.float32) >>> m_sum = flow.nn.BCELoss(weight, reduction="sum") >>> out = m_sum(sigmoid_input, target) >>> out tensor(12.3668, dtype=oneflow.float32) >>> m_mean = flow.nn.BCELoss(weight, reduction="mean") >>> out = m_mean(sigmoid_input, target) >>> out tensor(2.0611, dtype=oneflow.float32) >>> m_none = flow.nn.BCELoss() >>> out = m_none(sigmoid_input, target) >>> out tensor(1.0306, dtype=oneflow.float32)
-
class
oneflow.nn.
BCEWithLogitsLoss
(weight: Optional[oneflow.Tensor] = None, reduction: str = 'mean', pos_weight: Optional[oneflow.Tensor] = None)¶ This operator combines the Sigmoid and BCELoss together. For numerical stability, we apply some math tricks instead of using Sigmoid layer with BCELoss.
The equation is:
if
reduction
="none"
:\[out = -weight*[Pos\_weight*y*log\sigma({x}) + (1-y)*log(1-\sigma(x))]\]if
reduction
="mean"
:\[out = -\frac{weight}{n}\sum_{i=1}^n[Pos\_weight*y*log\sigma({x}) + (1-y)*log(1-\sigma(x))]\]if
reduction
="sum"
:\[out = -weight*\sum_{i=1}^n[Pos\_weight*y*log\sigma({x}) + (1-y)*log(1-\sigma(x))]\]- Parameters
weight (Tensor, optional) – The manual rescaling weight to the loss. Default:
None
size_average (bool, optional) – Deprecated (see
reduction
). Default:True
reduce (bool, optional) – Deprecated (see
reduction
). Default:True
reduction (str, optional) – The reduce type, it can be one of
"none"
,"mean"
,"sum"
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:"mean"
pos_weight (Tensor, optional) – The manual rescaling weight to the positive examples. Default:
None
- Shape:
Input: \((N,*)\) where * means, any number of additional dimensions
Target: \((N,*)\), same shape as the input
Output: scalar. If
reduction
is"none"
, then \((N,*)\), same shape as input.
For example:
>>> import oneflow as flow >>> input = flow.tensor([[1.2, 0.2, -0.3], [0.7, 0.6, -2], [0.7, 0.6, -2]], dtype=flow.float32) >>> target = flow.tensor([[0, 1, 0], [1, 0, 1], [1, 0, 1]], dtype=flow.float32) >>> weight = flow.tensor([[2, 2, 2], [2, 2, 2], [2, 2, 2]], dtype=flow.float32) >>> pos_weight = flow.tensor([1.2, 1.3, 1.4], dtype=flow.float32) >>> m = flow.nn.BCEWithLogitsLoss(weight=weight, pos_weight=pos_weight, reduction="none") >>> out = m(input, target) >>> out tensor([[2.9266, 1.5552, 1.1087], [0.9676, 2.0750, 5.9554], [0.9676, 2.0750, 5.9554]], dtype=oneflow.float32) >>> m = flow.nn.BCEWithLogitsLoss(weight=weight, pos_weight=pos_weight, reduction="mean") >>> out = m(input, target) >>> out tensor(2.6207, dtype=oneflow.float32) >>> m = flow.nn.BCEWithLogitsLoss(weight=weight, pos_weight=pos_weight, reduction="sum") >>> out = m(input, target) >>> out tensor(23.5865, dtype=oneflow.float32)
-
class
oneflow.nn.
BatchNorm1d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, L) slices, it’s common terminology to call this Temporal Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C)\) or \((N, C, L)\)
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(20, 100)) >>> m = flow.nn.BatchNorm1d(100) >>> y = m(x)
-
class
oneflow.nn.
BatchNorm2d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(4, 2, 8, 3)) >>> m = flow.nn.BatchNorm2d(num_features=2, eps=1e-5, momentum=0.1) >>> y = m(x)
-
class
oneflow.nn.
BatchNorm3d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, D, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, D, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(3, 2, 5, 8, 4)) >>> m = flow.nn.BatchNorm3d(num_features=2, eps=1e-5, momentum=0.1) >>> y = m(x) >>> y.size() oneflow.Size([3, 2, 5, 8, 4])
-
class
oneflow.nn.
CELU
(alpha: float = 1.0, inplace: bool = False)¶ Applies the element-wise function:
\[\begin{split}\text{CELU}(x, \alpha) = \begin{cases} x & \text{ if } x \ge 0 \\ \alpha*(exp(\frac{x}{\alpha})-1) & \text{ otherwise } \\ \end{cases}\end{split}\]- Parameters
alpha – the \(\alpha\) value for the CELU formulation. Default: 1.0
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> celu = flow.nn.CELU(alpha=0.5) >>> out = celu(input) >>> out tensor([-0.3161, 0.0000, 0.5000], dtype=oneflow.float32)
-
class
oneflow.nn.
COCOReader
(annotation_file: str, image_dir: str, batch_size: int, shuffle: bool = True, random_seed: Optional[int] = None, group_by_aspect_ratio: bool = True, remove_images_without_annotations: bool = True, stride_partition: bool = True, device: Optional[Union[oneflow._oneflow_internal.device, str]] = None, placement: Optional[oneflow._oneflow_internal.placement] = None, sbp: Optional[Union[oneflow._oneflow_internal.sbp.sbp, List[oneflow._oneflow_internal.sbp.sbp]]] = None)¶
-
class
oneflow.nn.
CTCLoss
(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)¶ The Connectionist Temporal Classification loss. The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.CTCLoss.html.
Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be \(\leq\) the input length.
- Parameters
blank (int, optional) – blank label. Default \(0\).
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the output losses will be divided by the target lengths and then the mean over the batch is taken. Default:'mean'
zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default:
False
Infinite losses mainly occur when the inputs are too short to be aligned to the targets.
- Shape:
Log_probs: Tensor of size \((T, N, C)\), where \(T = \text{input length}\), \(N = \text{batch size}\), and \(C = \text{number of classes (including blank)}\).
Targets: Tensor of size \((N, S)\) or \((\operatorname{sum}(\text{target_lengths}))\), where \(N = \text{batch size}\) and \(S = \text{max target length, if shape is } (N, S)\). It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the \((N, S)\) form, targets are padded to the length of the longest sequence, and stacked. In the \((\operatorname{sum}(\text{target_lengths}))\) form, the targets are assumed to be un-padded and concatenated within 1 dimension.
Input_lengths: Tuple or tensor of size \((N)\), where \(N = \text{batch size}\). It represent the lengths of the inputs (must each be \(\leq T\)). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.
Target_lengths: Tuple or tensor of size \((N)\), where \(N = \text{batch size}\). It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is \((N,S)\), target_lengths are effectively the stop index \(s_n\) for each target sequence, such that
target_n = targets[n,0:s_n]
for each target in a batch. Lengths must each be \(\leq S\) If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.
- Reference:
A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf
For example:
>>> import oneflow as flow >>> log_probs = flow.tensor( ... [ ... [[-1.1031, -0.7998, -1.5200], [-0.9808, -1.1363, -1.1908]], ... [[-1.2258, -1.0665, -1.0153], [-1.1135, -1.2331, -0.9671]], ... [[-1.3348, -0.6611, -1.5118], [-0.9823, -1.2355, -1.0941]], ... [[-1.3850, -1.3273, -0.7247], [-0.8235, -1.4783, -1.0994]], ... [[-0.9049, -0.8867, -1.6962], [-1.4938, -1.3630, -0.6547]], ... ], dtype=flow.float32) >>> targets = flow.tensor([[1, 2, 2], [1, 2, 2]], dtype=flow.int32) >>> input_lengths = flow.tensor([5, 5], dtype=flow.int32) >>> target_lengths = flow.tensor([3, 3], dtype=flow.int32) >>> loss_mean = flow.nn.CTCLoss() >>> out = loss_mean(log_probs, targets, input_lengths, target_lengths) >>> out tensor(1.1376, dtype=oneflow.float32) >>> loss_sum = flow.nn.CTCLoss(blank=0, reduction="sum") >>> out = loss_sum(log_probs, targets, input_lengths, target_lengths) >>> out tensor(6.8257, dtype=oneflow.float32)
-
class
oneflow.nn.
CoinFlip
(batch_size=1, random_seed=None, probability=0.5, device=None, placement=None, sbp=None)¶ The documentation is referenced from: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/supported_ops_legacy.html#nvidia.dali.ops.CoinFlip.
Generates random boolean values following a bernoulli distribution.
The probability of generating a value 1 (true) is determined by the
probability
argument.The shape of the generated data can be either specified explicitly with a
shape
argument, or chosen to match the shape of the input, if provided. If none are present, a single value per sample is generated.- Parameters
batch_size (int, optional) – Maximum batch size of the pipeline. Negative values for this parameter are invalid - the default value may only be used with serialized pipeline (the value stored in serialized pipeline is used instead). In most cases, the actual batch size of the pipeline will be equal to the maximum one. Default: 1
random_seed (int, optional) – Random seed. Default: None
probability (float, optional) – Probability of value 1. Default: 0.5
device (oneflow.device, optional) – Desired device of returned tensor. Default: if None, uses the current device for the default tensor type.
placement (oneflow.placement, optional) – Desired placement of returned global tensor. Default: if None, the returned tensor is local one using the argument device.
sbp (oneflow.sbp.sbp or tuple of oneflow.sbp.sbp, optional) – Desired sbp descriptor of returned global tensor. Default: if None, the returned tensor is local one using the argument device.
-
class
oneflow.nn.
CombinedMarginLoss
(m1: float = 1.0, m2: float = 0.0, m3: float = 0.0)¶ The operation implements “margin_softmax” in InsightFace: https://github.com/deepinsight/insightface/blob/master/recognition/arcface_mxnet/train.py The implementation of margin_softmax in InsightFace is composed of multiple operators. We fuse them for speed up.
Applies the function:
\[\begin{split}{\rm CombinedMarginLoss}(x_i, label) = \left\{\begin{matrix} \cos(m_1\cdot\arccos x_i+m_2) - m_3 & {\rm if} \ i == label \\ x_i & {\rm otherwise} \end{matrix}\right.\end{split}\]- Parameters
x (oneflow.Tensor) – A Tensor
label (oneflow.Tensor) – label with integer data type
m1 (float) – loss m1 parameter
m2 (float) – loss m2 parameter
m3 (float) – loss m3 parameter
Note
Here are some special cases:
when \(m_1=1, m_2\neq 0, m_3=0\), CombineMarginLoss has the same parameter as ArcFace .
when \(m_1=1, m_2=0, m_3\neq 0\), CombineMarginLoss has the same parameter as CosFace (a.k.a AM-Softmax) .
when \(m_1\gt 1, m_2=m_3=0\), CombineMarginLoss has the same parameter as A-Softmax.
- Returns
A Tensor
- Return type
For example:
>>> import numpy as np >>> import oneflow as flow >>> np_x = np.array([[-0.7027179, 0.0230609], [-0.02721931, -0.16056311], [-0.4565852, -0.64471215]]) >>> np_label = np.array([0, 1, 1]) >>> x = flow.tensor(np_x, dtype=flow.float32) >>> label = flow.tensor(np_label, dtype=flow.int32) >>> loss_func = flow.nn.CombinedMarginLoss(0.3, 0.5, 0.4) >>> out = loss_func(x, label) >>> out tensor([[-0.0423, 0.0231], [-0.0272, 0.1237], [-0.4566, -0.0204]], dtype=oneflow.float32)
-
class
oneflow.nn.
ConstantPad1d
(padding: Union[int, tuple, list], value: Union[int, float] = 0)¶ Pads the input tensor boundaries with a constant value. The interface is consistent with PyTorch, and referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ConstantPad1d.html.
For N-dimensional padding, use
torch.nn.functional.pad()
.- Parameters
padding (int, list, tuple) – the size of the padding. If is int, uses the same padding in both boundaries. If a 2-tuple, uses (\(\text{padding_left}\), \(\text{padding_right}\))
value (int, float) – The constant value used for padding. Defaults to 0.
- Shape:
Input: \((N, C, W_{in})\)
Output: \((N, C, W_{out})\) where
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor(np.arange(8).reshape(2,2,2).astype(np.float32)) >>> m = flow.nn.ConstantPad1d(padding=[1, 2], value=9.9999) >>> output = m(input) >>> output tensor([[[9.9999, 0.0000, 1.0000, 9.9999, 9.9999], [9.9999, 2.0000, 3.0000, 9.9999, 9.9999]], [[9.9999, 4.0000, 5.0000, 9.9999, 9.9999], [9.9999, 6.0000, 7.0000, 9.9999, 9.9999]]], dtype=oneflow.float32)
-
class
oneflow.nn.
ConstantPad2d
(padding: Union[int, tuple, list], value: Union[int, float] = 0)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ConstantPad2d.html.
This operator pads the input with constant value that user specifies. User can set the amount of padding by setting the parameter paddings.
- Parameters
padding (int, tuple, list) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\mathrm{padding_{left}}\), \(\mathrm{padding_{right}}\), \(\mathrm{padding_{top}}\), \(\mathrm{padding_{bottom}}\))
value (int, float) – The constant value used for padding. Defaults to 0.
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\(H_{out} = H_{in} + \mathrm{padding_{top}} + \mathrm{padding_{bottom}}\) \(W_{out} = W_{in} + \mathrm{padding_{left}} + \mathrm{padding_{right}}\)
For example:
>>> import oneflow as flow >>> import numpy as np >>> m = flow.nn.ConstantPad2d((2, 2, 1, 1), 1) >>> input = flow.tensor(np.arange(18).reshape((1, 2, 3, 3)).astype(np.float32)) >>> output = m(input) >>> output.shape oneflow.Size([1, 2, 5, 7]) >>> output tensor([[[[ 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 0., 1., 2., 1., 1.], [ 1., 1., 3., 4., 5., 1., 1.], [ 1., 1., 6., 7., 8., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1.]], [[ 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 9., 10., 11., 1., 1.], [ 1., 1., 12., 13., 14., 1., 1.], [ 1., 1., 15., 16., 17., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1.]]]], dtype=oneflow.float32)
-
class
oneflow.nn.
ConstantPad3d
(padding: Union[int, tuple, list], value: Union[int, float] = 0)¶ Pads the input tensor boundaries with a constant value. The interface is consistent with PyTorch, and referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ConstantPad3d.html.
For N-dimensional padding, use
flow.nn.functional.pad()
.- Parameters
padding (int, list, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 6-tuple, uses (\(\text{padding_left}\), \(\text{padding_right}\), \(\text{padding_top}\), \(\text{padding_bottom}\), \(\text{padding_front}\), \(\text{padding_back}\))
value (int, float) – The constant value used for padding. Defaults to 0.
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, D_{out}, H_{out}, W_{out})\) where
\(D_{out} = D_{in} + \text{padding_front} + \text{padding_back}\)
\(H_{out} = H_{in} + \text{padding_top} + \text{padding_bottom}\)
\(W_{out} = W_{in} + \text{padding_left} + \text{padding_right}\)
Examples:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor(np.arange(8).reshape(1,1,2,2,2).astype(np.int32)) >>> m = flow.nn.ConstantPad3d(padding=1, value=9) >>> output = m(input) >>> output tensor([[[[[9, 9, 9, 9], [9, 9, 9, 9], [9, 9, 9, 9], [9, 9, 9, 9]], [[9, 9, 9, 9], [9, 0, 1, 9], [9, 2, 3, 9], [9, 9, 9, 9]], [[9, 9, 9, 9], [9, 4, 5, 9], [9, 6, 7, 9], [9, 9, 9, 9]], [[9, 9, 9, 9], [9, 9, 9, 9], [9, 9, 9, 9], [9, 9, 9, 9]]]]], dtype=oneflow.int32)
-
class
oneflow.nn.
Conv1d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]], stride: Union[int, Tuple[int]] = 1, padding: Union[str, int, Tuple[int]] = 0, dilation: Union[int, Tuple[int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros')¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Conv1d.html.
Applies a 1D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, L)\) and output \((N, C_{\text{out}}, L_{\text{out}})\) can be precisely described as:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(L\) is a length of signal sequence.
stride
controls the stride for the cross-correlation, a single number or a one-element tuple.padding
controls the amount of padding applied to the input. It can be either a string {{‘valid’, ‘same’}} or a tuple of ints giving the amount of implicit padding applied on both sides.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.
Note
padding='valid'
is the same as no padding.padding='same'
pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1.- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int, tuple or str, optional) – Padding added to both sides of the input. Default: 0
padding_mode (string, optional) –
'zeros'
. Default:'zeros'
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
- Shape:
Input: \((N, C_{in}, L_{in})\)
Output: \((N, C_{out}, L_{out})\) where
\[L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
-
weight
¶ the learnable weights of the module of shape \((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \text{kernel\_size}}\)
- Type
-
bias
¶ the learnable bias of the module of shape (out_channels). If
bias
isTrue
, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \text{kernel\_size}}\)- Type
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> arr = np.random.randn(20, 16, 50) >>> input = flow.Tensor(arr) >>> m = nn.Conv1d(16, 33, 3, stride=2) >>> output = m(input)
-
class
oneflow.nn.
Conv2d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[str, int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros')¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Conv2d.html.
Applies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, H, W)\) and output \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) can be precisely described as:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
stride
controls the stride for the cross-correlation, a single number or a tuple.padding
controls the amount of implicit padding on both sides forpadding
number of points for each dimension.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.groups
controls the connections between inputs and outputs.in_channels
andout_channels
must both be divisible bygroups
. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels
, each input channel is convolved with its own set of filters (of size \(\frac{\text{out_channels}}{\text{in_channels}}\)).,
The parameters
kernel_size
,stride
,padding
,dilation
can either be:a single
int
– in which case the same value is used for the height and width dimensiona
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
Note
When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as a “depthwise convolution”.
In other words, for an input of size \((N, C_{in}, L_{in})\), a depthwise convolution with a depthwise multiplier K can be performed with the arguments \((C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})\).
- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional) –
'zeros'
. Default:'zeros'
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]
- Attr:
- weight (Tensor): the learnable weights of the module of shape
\((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}},\) \(\text{kernel_size[0]}, \text{kernel_size[1]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel_size}[i]}\)
- bias (Tensor): the learnable bias of the module of shape
(out_channels). If
bias
isTrue
, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel_size}[i]}\)
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> arr = np.random.randn(20, 16, 50, 100) >>> input = flow.Tensor(arr) >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> output = m(input)
-
class
oneflow.nn.
Conv3d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[str, int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros')¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Conv3d.html.
Applies a 3D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{in}, D, H, W)\) and output \((N, C_{out}, D_{out}, H_{out}, W_{out})\) can be precisely described as:
\[out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k)\]where \(\star\) is the valid 3D cross-correlation operator
stride
controls the stride for the cross-correlation.padding
controls the amount of padding applied to the input. It can be either a string {{‘valid’, ‘same’}} or a tuple of ints giving the amount of implicit padding applied on both sides.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.
The parameters
kernel_size
,stride
,padding
,dilation
can either be:a single
int
– in which case the same value is used for the depth, height and width dimensiona
tuple
of three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
Note
padding='valid'
is the same as no padding.padding='same'
pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1.- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int, tuple or str, optional) – Padding added to all six sides of the input. Default: 0
padding_mode (string, optional) –
'zeros'
. Default:'zeros'
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
- Shape:
Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]
-
weight
¶ the learnable weights of the module of shape \((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]}, \text{kernel\_size[2]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
- Type
-
bias
¶ the learnable bias of the module of shape (out_channels). If
bias
isTrue
, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)- Type
For example:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> arr = np.random.randn(1, 2, 5, 5, 5) >>> input = flow.Tensor(arr) >>> m = nn.Conv3d(2, 4, kernel_size=3, stride=1) >>> output = m(input)
-
class
oneflow.nn.
ConvTranspose1d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]], stride: Union[int, Tuple[int]] = 1, padding: Union[int, Tuple[int]] = 0, output_padding: Union[int, Tuple[int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int]] = 1, padding_mode: str = 'zeros')¶ Applies a 1D transposed convolution operator over an input image composed of several input planes.
This module can be seen as the gradient of Conv1d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
This module supports TensorFloat32.
stride
controls the stride for the cross-correlation.padding
controls the amount of implicit zero padding on both sides fordilation * (kernel_size - 1) - padding
number of points. See note below for details.output_padding
controls the additional size added to one side of the output shape. See note below for details.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.
Note
The
padding
argument effectively addsdilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that when aConv1d
and aConvTranspose1d
are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, whenstride > 1
,Conv1d
maps multiple input shapes to the same output shape.output_padding
is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note thatoutput_padding
is only used to find output shape, but does not actually add zero-padding to output.Note
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting
torch.backends.cudnn.deterministic = True
. Please see the notes on randomness for background.- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) –
dilation * (kernel_size - 1) - padding
zero-padding will be added to both sides of the input. Default: 0output_padding (int or tuple, optional) – Additional size added to one side of the output shape. Default: 0
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
- Shape:
Input: \((N, C_{in}, L_{in})\)
Output: \((N, C_{out}, L_{out})\) where
\[L_{out} = (L_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel_size} - 1) + \text{output_padding} + 1\]
-
weight
¶ the learnable weights of the module of shape \((\\text{in\_channels}, \frac{\\text{out\\_channels}}{\text{groups}},\) \(\\text{kernel\\_size})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \\text{kernel\\_size}}\)
- Type
-
class
oneflow.nn.
ConvTranspose2d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: int = 1, padding_mode: str = 'zeros')¶ Applies a 2D transposed convolution operator over an input image composed of several input planes.
This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) –
dilation * (kernel_size - 1) - padding
zero-padding will be added to both sides of each dimension in the input. Default: 0output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) where
\[ \begin{align}\begin{aligned}H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0]\\ \times (\text{kernel_size}[0] - 1) + \text{output_padding}[0] + 1\end{aligned}\end{align} \]\[ \begin{align}\begin{aligned}W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1]\\ \times (\text{kernel_size}[1] - 1) + \text{output_padding}[1] + 1\end{aligned}\end{align} \]
-
weight
¶ the learnable weights of the module of shape \((\text{in_channels}, \frac{\text{out_channels}}{\text{groups}},\) \(\text{kernel_size[0]}, \text{kernel_size[1]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel_size}[i]}\)
- Type
-
bias
¶ the learnable bias of the module of shape (out_channels) If
bias
isTrue
, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel_size}[i]}\)- Type
Examples:
>>> import numpy as np >>> import oneflow as flow >>> import oneflow.nn as nn >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> m = m.to("cuda") >>> input = flow.Tensor(np.random.randn(20, 16, 50, 100), device=flow.device("cuda")) >>> output = m(input) >>> output.size() oneflow.Size([20, 33, 93, 100])
-
class
oneflow.nn.
ConvTranspose3d
(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[int, Tuple[int, int, int]] = 0, output_padding: Union[int, Tuple[int, int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int, int]] = 1, padding_mode: str = 'zeros')¶ Applies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes.
This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).
This module supports TensorFloat32.
stride
controls the stride for the cross-correlation.padding
controls the amount of implicit zero padding on both sides fordilation * (kernel_size - 1) - padding
number of points. See note below for details.output_padding
controls the additional size added to one side of the output shape. See note below for details.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.
The parameters
kernel_size
,stride
,padding
,output_padding
can either be:a single
int
– in which case the same value is used for the depth, height and width dimensionsa
tuple
of three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
Note
The
padding
argument effectively addsdilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that when aConv3d
and aConvTranspose3d
are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, whenstride > 1
,Conv3d
maps multiple input shapes to the same output shape.output_padding
is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note thatoutput_padding
is only used to find output shape, but does not actually add zero-padding to output.- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) –
dilation * (kernel_size - 1) - padding
zero-padding will be added to both sides of each dimension in the input. Default: 0output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
- Shape:
Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) where
\[D_{out} = (D_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel_size}[0] - 1) + \text{output_padding}[0] + 1\]\[H_{out} = (H_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel_size}[1] - 1) + \text{output_padding}[1] + 1\]\[W_{out} = (W_{in} - 1) \times \text{stride}[2] - 2 \times \text{padding}[2] + \text{dilation}[2] \times (\text{kernel_size}[2] - 1) + \text{output_padding}[2] + 1\]
-
weight
¶ the learnable weights of the module of shape \((\text{in_channels}, \frac{\text{out_channels}}{\text{groups}},\) \(\text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel_size}[i]}\)
- Type
-
bias
¶ the learnable bias of the module of shape (out_channels) If
bias
isTrue
, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel_size}[i]}\)- Type
Examples:
>>> import oneflow as flow >>> import oneflow.nn as nn >>> # With square kernels and equal stride >>> m = nn.ConvTranspose3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2)) >>> input = flow.randn(20, 16, 10, 50, 100) >>> output = m(input)
-
class
oneflow.nn.
CosineSimilarity
(dim: Optional[int] = 1, eps: Optional[float] = 1e-08)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/stable/generated/torch.nn.CosineSimilarity.html#torch.nn.CosineSimilarity
Returns cosine similarity between \(x_1\) and \(x_2\), computed along dim.
\[\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)}.\]- Parameters
dim (int, optional) – Dimension where cosine similarity is computed. Default: 1
eps (float, optional) – Small value to avoid division by zero. Default: 1e-8
- Shape:
Input1: \((\ast_1, D, \ast_2)\) where D is at position dim.
- Input2: \((\ast_1, D, \ast_2)\), same number of dimensions as x1, matching x1 size at dimension dim,
and broadcastable with x1 at other dimensions.
Output: \((\ast_1, \ast_2)\)
For example:
>>> import oneflow as flow >>> from oneflow import nn >>> input1 = flow.randn(100, 128) >>> input2 = flow.randn(100, 128) >>> cos = nn.CosineSimilarity(dim=1, eps=1e-6) >>> output = cos(input1, input2)
-
class
oneflow.nn.
CropMirrorNormalize
(color_space='BGR', output_layout='NCHW', crop_h=0, crop_w=0, crop_pos_y=0.5, crop_pos_x=0.5, mean=[0.0], std=[1.0], output_dtype=oneflow.float)¶ The documentation is referenced from: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/supported_ops_legacy.html#nvidia.dali.ops.CropMirrorNormalize.
Performs fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting.
Normalization takes the input images and produces the output by using the following formula:
\[output = (input - mean) / std\]Note
If no cropping arguments are specified, only mirroring and normalization will occur.
This operator allows sequence inputs and supports volumetric data.
- Parameters
color_space (str, optional) – The color space of the input image. Default: “BGR”
output_layout (str, optional) – Tensor data layout for the output. Default: “NCHW”
crop_h (int, optional) – Cropping the window height (in pixels). Default: 0
crop_w (int, optional) – Cropping window width (in pixels). Default: 0
crop_pos_y (float, optional) – Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner). Default: 0.5
crop_pos_x (float, optional) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Default: 0.5
mean (float or list of float, optional) – Mean pixel values for image normalization. Default: [0.0],
std (float or list of float, optional) – Standard deviation values for image normalization. Default: [1.0]
output_dtype (oneflow.dtype, optional) – Output data type. Default:
oneflow.float
-
class
oneflow.nn.
CrossEntropyLoss
(weight: Optional[oneflow.Tensor] = None, ignore_index: int = - 100, reduction: str = 'mean')¶ This criterion combines
LogSoftmax
andNLLLoss
in one single class.It is useful when training a classification problem with C classes.
The input is expected to contain raw, unnormalized scores for each class.
input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case (described later).
This criterion expects a class index in the range \([0, C-1]\) as the target for each value of a 1D tensor of size minibatch;
The loss can be described as:
\[\text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right) = -x[class] + \log\left(\sum_j \exp(x[j])\right)\]Can also be used for higher dimension inputs, such as 2D images, by providing an input of size \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\), where \(K\) is the number of dimensions, and a target of appropriate shape (see below).
- Parameters
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the weighted mean of the output is taken,'sum'
: the output will be summed. Default:'mean'
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor( ... [[-0.1664078, -1.7256707, -0.14690138], ... [-0.21474946, 0.53737473, 0.99684894], ... [-1.135804, -0.50371903, 0.7645404]], dtype=flow.float32) >>> target = flow.tensor(np.array([0, 1, 2]), dtype=flow.int32) >>> out = flow.nn.CrossEntropyLoss(reduction="none")(input, target) >>> out tensor([0.8020, 1.1167, 0.3583], dtype=oneflow.float32) >>> out_sum = flow.nn.CrossEntropyLoss(reduction="sum")(input, target) >>> out_sum tensor(2.2769, dtype=oneflow.float32) >>> out_mean = flow.nn.CrossEntropyLoss(reduction="mean")(input, target) >>> out_mean tensor(0.7590, dtype=oneflow.float32)
-
class
oneflow.nn.
Dropout
(p: float = 0.5, inplace: bool = False, generator=None)¶ During training, randomly zeroes some of the elements of the input tensor with probability
p
using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper “Improving neural networks by preventing co-adaptation of feature detectors”.
Furthermore, the outputs are scaled by a factor of \(\frac{1}{1-p}\) during training. This means that during evaluation the module simply computes an identity function.
Additionally, we can pass an extra Tensor addend which shape is consistent with input Tensor. The addend Tensor will be add in result after dropout, it is very useful in model’s residual connection structure.
- Parameters
p – probability of an element to be zeroed. Default: 0.5
inplace – If set to
True
, will do this operation in-place. Default:False
generator – A pseudorandom number generator for sampling
- Shape:
Input: \((*)\). Input can be of any shape
Output: \((*)\). Output is of the same shape as input
For example:
example 1:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.Dropout(p=0) >>> arr = np.array( ... [ ... [-0.7797, 0.2264, 0.2458, 0.4163], ... [0.4299, 0.3626, -0.4892, 0.4141], ... [-1.4115, 1.2183, -0.5503, 0.6520], ... ] ... ) >>> x = flow.Tensor(arr) >>> y = m(x) >>> y tensor([[-0.7797, 0.2264, 0.2458, 0.4163], [ 0.4299, 0.3626, -0.4892, 0.4141], [-1.4115, 1.2183, -0.5503, 0.6520]], dtype=oneflow.float32)
example 2:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.Dropout(p=0) >>> arr = np.array( ... [ ... [-0.7797, 0.2264, 0.2458, 0.4163], ... [0.4299, 0.3626, -0.4892, 0.4141], ... [-1.4115, 1.2183, -0.5503, 0.6520], ... ] ... ) >>> x = flow.Tensor(arr) >>> addend = flow.ones((3, 4), dtype=flow.float32) >>> y = m(x, addend=addend) >>> y tensor([[ 0.2203, 1.2264, 1.2458, 1.4163], [ 1.4299, 1.3626, 0.5108, 1.4141], [-0.4115, 2.2183, 0.4497, 1.6520]], dtype=oneflow.float32)
-
inplace
: bool¶
-
p
: float¶
-
class
oneflow.nn.
ELU
(alpha: float = 1.0, inplace: bool = False)¶ Applies the element-wise function:
\[\begin{split}\text{ELU}(x) = \begin{cases} x & \text{ if } x \gt 0 \\ \alpha*(exp(x)-1) & \text{ if } x \le 0 \\ \end{cases}\end{split}\]- Parameters
alpha – the \(\alpha\) value for the ELU formulation. Default: 1.0
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> elu = flow.nn.ELU() >>> out = elu(input) >>> out tensor([-0.3935, 0.0000, 0.5000], dtype=oneflow.float32)
-
class
oneflow.nn.
Embedding
(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[oneflow.Tensor] = None)¶ A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
- Parameters
num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
padding_idx (int, optional) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector atpadding_idx
will default to all zeros, but can be updated to another value to be used as the padding vector.max_norm (float, optional) – If given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
norm_type (float, optional) – The p of the p-norm to compute for the
max_norm
option. Default2
.scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False
For example:
>>> import numpy as np >>> import oneflow as flow >>> indices = flow.tensor([[1, 2, 4, 5], [4, 3, 2, 9]], dtype=flow.int) >>> m = flow.nn.Embedding(10, 3) >>> y = m(indices)
-
class
oneflow.nn.
FakeQuantization
(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')¶ Simulate the quantize and dequantize operations in training time.
The output will be computed as:
if quantization_scheme == “symmetric”:
\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit - 1} - 1\\& quant\_min = -quant\_max\\& clamp(round(x / scale), quant\_min, quant\_max) * scale\end{aligned}\end{align} \]elif quantization_scheme == “affine”:
\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit} - 1\\& quant\_min = 0\\& (clamp(round(x / scale + zero\_point), quant\_min, quant\_max) - zero\_point) * scale\end{aligned}\end{align} \]- Parameters
input (oneflow.Tensor) – the input value(s), in
oneflow.float32
.scale (oneflow.Tensor) – quantization scale.
zero_point (oneflow.Tensor) – quantization zero_point.
quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.
quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.
quantization_formula (str) – Support “google” or “cambricon”.
- Returns
Input tensor after quantize and dequantize operations.
- Return type
For example:
>>> import numpy as np >>> import oneflow as flow >>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32) >>> input_tensor = flow.tensor( ... weight, dtype=flow.float32 ... ) >>> quantization_bit = 8 >>> quantization_scheme = "symmetric" >>> quantization_formula = "google" >>> per_layer_quantization = True >>> min_max_observer = flow.nn.MinMaxObserver(quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme, per_layer_quantization=per_layer_quantization) >>> fake_quantization = flow.nn.FakeQuantization(quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme) >>> scale, zero_point = min_max_observer( ... input_tensor, ... ) >>> output_tensor = fake_quantization( ... input_tensor, ... scale, ... zero_point, ... )
-
class
oneflow.nn.
Flatten
(start_dim: int = 1, end_dim: int = - 1)¶ Flattens a contiguous range of dims into a tensor. For use with: nn.Sequential.
- Parameters
start_dim – first dim to flatten (default = 1).
end_dim – last dim to flatten (default = -1).
For example:
>>> import oneflow as flow >>> input = flow.Tensor(32, 1, 5, 5) >>> m = flow.nn.Flatten() >>> output = m(input) >>> output.shape oneflow.Size([32, 25])
-
class
oneflow.nn.
Fold
(output_size: Union[int, Tuple[int, int]], kernel_size: Union[int, Tuple[int, int]], dilation: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, stride: Union[int, Tuple[int, int]] = 1)¶
-
class
oneflow.nn.
FusedBatchNorm1d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Fused Batch Normalization over a 2D or 3D input, the formula is:
\[out = ReLU(BatchNorm(input) + addend)\]The formula of Batch Normalization is:
\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, L) slices, it’s common terminology to call this Temporal Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C)\) or \((N, C, L)\)
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(20, 100)).to("cuda") # FusedBatchNorm support in GPU currently. >>> m = flow.nn.FusedBatchNorm1d(num_features=100, eps=1e-5, momentum=0.1).to("cuda") >>> y = m(x, addend=None)
-
class
oneflow.nn.
FusedBatchNorm2d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Fused Batch Normalization over a 4D input, the formula is:
\[out = ReLU(BatchNorm(input) + addend)\]The formula of Batch Normalization is:
\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(4, 2, 8, 3)).to("cuda") # FusedBatchNorm support in GPU currently. >>> m = flow.nn.FusedBatchNorm2d(num_features=2, eps=1e-5, momentum=0.1).to("cuda") >>> y = m(x, addend=None)
-
class
oneflow.nn.
FusedBatchNorm3d
(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶ Applies Fused Batch Normalization over a 5D input, the formula is:
\[out = ReLU(BatchNorm(input) + addend)\]The formula of Batch Normalization is:
\[\begin{split}y = \\frac{x - \\mathrm{E}[x]}{\\sqrt{\\mathrm{Var}[x] + \\epsilon}} * \\gamma + \\beta\end{split}\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Because the Batch Normalization is done over the C dimension, computing statistics on (N, D, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
- Parameters
num_features – \(C\) from an expected input of size \((N, C, D, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1affine – a boolean value that when set to
True
, this module has learnable affine parameters. Default:True
track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics, and initializes statistics buffersrunning_mean
andrunning_var
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.Tensor(np.random.randn(3, 2, 5, 8, 4)).to("cuda") # FusedBatchNorm support in GPU currently. >>> m = flow.nn.FusedBatchNorm3d(num_features=2, eps=1e-5, momentum=0.1).to("cuda") >>> y = m(x, addend=None)
-
class
oneflow.nn.
FusedMLP
(in_features: int, hidden_features: Tuple[int], out_features: int, hidden_dropout_rate: Optional[Tuple[float]] = None, out_dropout_rate: float = 0.0, skip_final_activation=False)¶ Applies a linear transformation with relu activation to the incoming data: \(y = ReLU(xA^T + b)\)
- Parameters
in_features – size of each input sample
hidden_features – A tuple of each Linear layer hidden size
out_features – The final Linear layer hidden size
hidden_dropout_rate – A tuple of each hidden layer’s dropout rate
out_dropout_rate – The final Linear layer’s dropout rate
- Shape:
Input: \((N, *, H_{in})\) where \(*\) means any number of additional dimensions and \(H_{in} = {in\_features}\)
Output: \((N, *, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = {out\_features}\).
- Attr:
skip_final_activation
: Whether to skip final hidden layer’s activation. Default: False.
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.FusedMLP(128, [256, 512], 1024).to("cuda") >>> input = flow.Tensor(np.random.randn(1, 128)).to("cuda") >>> output = m(input) >>> output.size() oneflow.Size([1, 1024])
-
add_parameters
() → None¶ Register parameter in FusedMLP module.
-
bias
(i)¶ Return the ith bias.
-
biases
()¶ Returns the bias list in FusedMLP module.
-
weight
(i)¶ Returns the ith weight.
-
weights
()¶ Returns the weight list in FusedMLP module.
-
class
oneflow.nn.
GELU
¶ Gelu activation operator.
The equation is:
\[out = 0.5 * x * (1 + tanh(\sqrt{\frac{2}{\pi}} * (x + 0.044715x^{3})))\]- Parameters
x (oneflow.Tensor) – Input Tensor
- Returns
A Tensor.
- Return type
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> gelu = flow.nn.GELU() >>> out = gelu(input) >>> out tensor([-0.1543, 0.0000, 0.3457], dtype=oneflow.float32)
-
class
oneflow.nn.
GLU
(dim: Optional[int] = - 1)¶ The GLU activation.
- Parameters
input (Tensor, float) – input tensor.
dim (int, optional) – dimension on which to split the input. Default: -1
- Shape:
Input: \((\ast_1, N, \ast_2)\) where * means, any number of additional dimensions
Output: \((\ast_1, M, \ast_2)\) where \(M=N/2\)
The formula is:
\[GLU(input) = GLU(a, b) = a \otimes sigmoid(b)\]Note
where input is split in half along dim to form a and b, ⊗ is the element-wise product between matrices.
For example:
>>> import oneflow as flow >>> import oneflow.nn as nn >>> m = nn.GLU() >>> x = flow.tensor([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=flow.float32) >>> y = m(x) >>> y tensor([[0.9526, 1.9640], [4.9954, 5.9980]], dtype=oneflow.float32)
-
class
oneflow.nn.
GRU
(*args, **kwargs)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/_modules/torch/nn/modules/rnn.html#GRU.
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
For each element in the input sequence, each layer computes the following
function:
\[\begin{split}\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, \(h_{(t-1)}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(r_t\), \(z_t\), \(n_t\) are the reset, update, and new gates, respectively. \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
In a multilayer GRU, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l >= 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability
dropout
.- Parameters
num_layers – Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
batch_first – If
True
, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:False
dropout – If non-zero, introduces a Dropout layer on the outputs of each GRU layer except the last layer, with dropout probability equal to
dropout
. Default: 0bidirectional – If
True
, becomes a bidirectional GRU. Default:False
- Inputs: input, h_0
input: tensor of shape \((L, N, H_{in})\) when
batch_first=False
or \((N, L, H_{in})\) whenbatch_first=True
containing the features of the input sequence.h_0: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for each element in the batch. Defaults to zeros if not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{out} ={} & \text{hidden\_size} \end{aligned}\end{split}\]- Outputs: output, h_n
output: tensor of shape \((L, N, D * H_{out})\) when
batch_first=False
or \((N, L, D * H_{out})\) whenbatch_first=True
containing the output features (h_t) from the last layer of the GRU, for each t. If ah_n: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for each element in the batch.
-
weight_ih_l[k]
the learnable input-hidden weights of the \(\text{k}^{th}\) layer (W_ir|W_iz|W_in), of shape (3*hidden_size, input_size) for k = 0. Otherwise, the shape is (3*hidden_size, num_directions * hidden_size)
-
weight_hh_l[k]
the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size)
-
bias_ih_l[k]
the learnable input-hidden bias of the \(\text{k}^{th}\) layer (b_ir|b_iz|b_in), of shape (3*hidden_size)
-
bias_hh_l[k]
the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer (b_hr|b_hz|b_hn), of shape (3*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional GRUs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False
:output.view(seq_len, batch, num_directions, hidden_size)
.For example:
>>> import oneflow as flow >>> import numpy as np >>> rnn = flow.nn.GRU(10, 20, 2) >>> input = flow.tensor(np.random.randn(5, 3, 10), dtype=flow.float32) >>> h0 = flow.tensor(np.random.randn(2, 3, 20), dtype=flow.float32) >>> output, hn = rnn(input, h0) >>> output.size() oneflow.Size([5, 3, 20])
-
class
oneflow.nn.
GRUCell
(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html.
A gated recurrent unit (GRU) cell
\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
- Inputs: input, hidden
input : tensor containing input features
hidden : tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.
- Outputs: h’
h’ : tensor containing the next hidden state for each element in the batch
- Shape:
input: \((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.
hidden: \((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.
output: \((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.
-
weight_ih
¶ the learnable input-hidden weights, of shape (3*hidden_size, input_size)
-
weight_hh
¶ the learnable hidden-hidden weights, of shape (3*hidden_size, hidden_size)
-
bias_ih
¶ the learnable input-hidden bias, of shape (3*hidden_size)
-
bias_hh
¶ the learnable hidden-hidden bias, of shape (3*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
For example:
>>> import oneflow as flow >>> import oneflow.nn as nn >>> rnn = nn.GRUCell(10, 20) >>> input = flow.randn(6, 3, 10) >>> hx = flow.randn(3, 20) >>> hx = rnn(input[0], hx) >>> hx.size() oneflow.Size([3, 20])
-
class
oneflow.nn.
GroupNorm
(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.GroupNorm.html.
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The input channels are separated into
num_groups
groups, each containingnum_channels / num_groups
channels. The mean and standard-deviation are calculated separately over the each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectors of sizenum_channels
ifaffine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
num_groups (int) – number of groups to separate the channels into
num_channels (int) – number of channels expected in input
eps – a value added to the denominator for numerical stability. Default: 1e-5
affine – a boolean value that when set to
True
, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
- Shape:
Input: \((N, C, *)\) where \(C=\text{num_channels}\)
Output: \((N, C, *)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.Tensor(np.random.randn(20, 6, 10, 10)) >>> # Separate 6 channels into 3 groups >>> m = flow.nn.GroupNorm(3, 6) >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) >>> m = flow.nn.GroupNorm(6, 6) >>> # Put all 6 channels into a single group (equivalent with LayerNorm) >>> m = flow.nn.GroupNorm(1, 6) >>> # Activating the module >>> output = m(input)
-
class
oneflow.nn.
Hardshrink
(lambd: float = 0.5, inplace: bool = False)¶ The Hardshrink activation.
The formula is:
\[\begin{split}\text{Hardshrink}(x) = \begin{cases} x, & \text{ if } x > \lambda \\ x, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]- Parameters
lambd – the \(\lambda\) value for the Hardshrink formulation. Default: 0.5
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-1.1, 0, 0.2, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> hardshrink = flow.nn.Hardshrink(lambd=0.5) >>> out = hardshrink(input) >>> out tensor([-1.1000, 0.0000, 0.0000, 0.0000], dtype=oneflow.float32)
-
class
oneflow.nn.
Hardsigmoid
(inplace: bool = False)¶ Applies the element-wise function:
\[\begin{split}\text{Hardsigmoid}(x) = \begin{cases} 0 & \text{ if } x \le -3 \\ 1 & \text{ if } x \ge +3 \\ \frac{x}{6} + \frac{1}{2} & \text{ otherwise } \\ \end{cases}\end{split}\]- Parameters
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> hardsigmoid = flow.nn.Hardsigmoid() >>> out = hardsigmoid(input) >>> out tensor([0.4167, 0.5000, 0.5833], dtype=oneflow.float32)
-
class
oneflow.nn.
Hardswish
(inplace: bool = False)¶ Applies the hardswish function, element-wise, as described in the paper: Searching for MobileNetV3.
\[\begin{split}\text{Hardswish}(x) = \begin{cases} 0 & \text{ if } x \le -3 \\ x & \text{ if } x \ge +3 \\ x*(x+3)/6 & \text{ otherwise } \\ \end{cases}\end{split}\]- Parameters
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> hardswish = flow.nn.Hardswish() >>> out = hardswish(input) >>> out tensor([-0.2083, 0.0000, 0.2917], dtype=oneflow.float32)
-
class
oneflow.nn.
Hardtanh
(min_val: float = - 1, max_val: float = 1, inplace: bool = False, min_value: Optional[float] = None, max_value: Optional[float] = None)¶ Applies the HardTanh function element-wise
HardTanh is defined as:
\[\begin{split}\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ -1 & \text{ if } x < -1 \\ x & \text{ otherwise } \\ \end{cases}\end{split}\]The range of the linear region \([-1, 1]\) can be adjusted using
min_val
andmax_val
.- Parameters
min_val – minimum value of the linear region range. Default: -1
max_val – maximum value of the linear region range. Default: 1
inplace – can optionally do the operation in-place. Default:
False
Keyword arguments
min_value
andmax_value
have been deprecated in favor ofmin_val
andmax_val
.- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.Hardtanh() >>> arr = np.array([0.2, 0.3, 3.0, 4.0]) >>> x = flow.Tensor(arr) >>> out = m(x) >>> out tensor([0.2000, 0.3000, 1.0000, 1.0000], dtype=oneflow.float32)
-
class
oneflow.nn.
Identity
(*args, **kwargs)¶ A placeholder identity operator that is argument-insensitive.
- Parameters
args – any argument (unused)
kwargs – any keyword argument (unused)
For example:
import numpy as np import oneflow as flow m = flow.nn.Identity() input = flow.Tensor(np.random.rand(2, 3, 4, 5)) output = m(input) # output = input
-
class
oneflow.nn.
InstanceNorm1d
(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = False, track_running_stats: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.InstanceNorm1d.html.
Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if
affine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_stats
is set toTrue
, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentum
of 0.1.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm1d
andLayerNorm
are very similar, but have some subtle differences.InstanceNorm1d
is applied on each channel of channeled data like multidimensional time series, butLayerNorm
is usually applied on entire sample and often in NLP tasks. Additionally,LayerNorm
applies elementwise affine transform, whileInstanceNorm1d
usually don’t apply affine transform.- Parameters
num_features – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.1
affine – a boolean value that when set to
True
, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:False
.track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, L)\)
Output: \((N, C, L)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> # Without Learnable Parameters >>> m = flow.nn.InstanceNorm1d(100) >>> # With Learnable Parameters >>> m = flow.nn.InstanceNorm1d(100, affine=True) >>> x = flow.Tensor(np.random.randn(20, 100, 40)) >>> output = m(x)
-
class
oneflow.nn.
InstanceNorm2d
(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = False, track_running_stats: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.InstanceNorm2d.html.
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if
affine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_stats
is set toTrue
, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentum
of 0.1.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm2d
andLayerNorm
are very similar, but have some subtle differences.InstanceNorm2d
is applied on each channel of channeled data like RGB images, butLayerNorm
is usually applied on entire sample and often in NLP tasks. Additionally,LayerNorm
applies elementwise affine transform, whileInstanceNorm2d
usually don’t apply affine transform.- Parameters
num_features – \(C\) from an expected input of size \((N, C, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.1
affine – a boolean value that when set to
True
, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:False
.track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> # Without Learnable Parameters >>> m = flow.nn.InstanceNorm2d(100) >>> # With Learnable Parameters >>> m = flow.nn.InstanceNorm2d(100, affine=True) >>> x = flow.Tensor(np.random.randn(20, 100, 35, 45)) >>> output = m(x)
-
class
oneflow.nn.
InstanceNorm3d
(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = False, track_running_stats: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.InstanceNorm3d.html.
Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if
affine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).By default, this layer uses instance statistics computed from input data in both training and evaluation modes.
If
track_running_stats
is set toTrue
, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentum
of 0.1.Note
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.Note
InstanceNorm3d
andLayerNorm
are very similar, but have some subtle differences.InstanceNorm3d
is applied on each channel of channeled data like 3D models with RGB color, butLayerNorm
is usually applied on entire sample and often in NLP tasks. Additionally,LayerNorm
applies elementwise affine transform, whileInstanceNorm3d
usually don’t apply affine transform.- Parameters
num_features – \(C\) from an expected input of size \((N, C, D, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.1
affine – a boolean value that when set to
True
, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:False
.track_running_stats – a boolean value that when set to
True
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:False
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
For example:
>>> import oneflow as flow >>> import numpy as np >>> # Without Learnable Parameters >>> m = flow.nn.InstanceNorm3d(100) >>> # With Learnable Parameters >>> m = flow.nn.InstanceNorm3d(100, affine=True) >>> x = flow.Tensor(np.random.randn(20, 100, 35, 45, 10)) >>> output = m(x)
-
class
oneflow.nn.
KLDivLoss
(reduction: str = 'mean', log_target: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.KLDivLoss.html.
The Kullback-Leibler divergence loss measure
Kullback-Leibler divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.
As with
NLLLoss
, the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are interpreted as probabilities by default, but could be considered as log-probabilities withlog_target
set toTrue
.This criterion expects a target Tensor of the same size as the input Tensor.
The unreduced (i.e. with
reduction
set to'none'
) loss can be described as:\[l(x,y) = L = \{ l_1,\dots,l_N \}, \quad l_n = y_n \cdot \left( \log y_n - x_n \right)\]where the index \(N\) spans all dimensions of
input
and \(L\) has the same shape asinput
. Ifreduction
is not'none'
(default'mean'
), then:\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';} \\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]In default
reduction
mode'mean'
, the losses are averaged for each minibatch over observations as well as over dimensions.'batchmean'
mode gives the correct KL divergence where losses are averaged over batch dimension only.'mean'
mode’s behavior will be changed to the same as'batchmean'
in the next major release.- Parameters
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'batchmean'
|'sum'
|'mean'
.'none'
: no reduction will be applied.'batchmean'
: the sum of the output will be divided by batchsize.'sum'
: the output will be summed.'mean'
: the output will be divided by the number of elements in the output. Default:'mean'
log_target (bool, optional) – Specifies whether target is passed in the log space. Default:
False
Note
reduction
='mean'
doesn’t return the true kl divergence value, please usereduction
='batchmean'
which aligns with KL math definition. In the next major release,'mean'
will be changed to be the same as'batchmean'
.- Shape:
Input: \((N, *)\) where \(*\) means, any number of additional dimensions
Target: \((N, *)\), same shape as the input
Output: scalar by default. If :attr:
reduction
is'none'
, then \((N, *)\), the same shape as the input
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor([-0.9021705, 0.08798598, 1.04686249], dtype=flow.float32) >>> target = flow.tensor([1.22386942, -0.89729659, 0.01615712], dtype=flow.float32) >>> m = flow.nn.KLDivLoss(reduction="none", log_target=False) >>> out = m(input, target) >>> out tensor([ 1.3514, 0.0000, -0.0836], dtype=oneflow.float32) >>> m = flow.nn.KLDivLoss(reduction="mean", log_target=False) >>> out = m(input, target) >>> out tensor(0.4226, dtype=oneflow.float32) >>> m = flow.nn.KLDivLoss(reduction="sum", log_target=True) >>> out = m(input, target) >>> out tensor(5.7801, dtype=oneflow.float32)
-
class
oneflow.nn.
L1Loss
(reduction: str = 'mean')¶ This operator computes the L1 Loss between each element in input and target.
The equation is:
if reduction = “none”:
\[output = |Target - Input|\]if reduction = “mean”:
\[output = \frac{1}{n}\sum_{i=1}^n|Target_i - Input_i|\]if reduction = “sum”:
\[output = \sum_{i=1}^n|Target_i - Input_i|\]- Parameters
input (oneflow.Tensor) – the input Tensor.
target (oneflow.Tensor) – The target Tensor.
reduction (str) – The reduce type, it can be one of “none”, “mean”, “sum”. Defaults to “mean”.
- Returns
The result Tensor.
- Return type
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor([[1, 1, 1], [2, 2, 2], [7, 7, 7]], dtype = flow.float32) >>> target = flow.tensor([[4, 4, 4], [4, 4, 4], [4, 4, 4]], dtype = flow.float32) >>> m = flow.nn.L1Loss(reduction="none") >>> out = m(input, target) >>> out tensor([[3., 3., 3.], [2., 2., 2.], [3., 3., 3.]], dtype=oneflow.float32) >>> m_mean = flow.nn.L1Loss(reduction="mean") >>> out = m_mean(input, target) >>> out tensor(2.6667, dtype=oneflow.float32) >>> m_mean = flow.nn.L1Loss(reduction="sum") >>> out = m_mean(input, target) >>> out tensor(24., dtype=oneflow.float32)
-
class
oneflow.nn.
LSTM
(*args, **kwargs)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/_modules/torch/nn/modules/rnn.html#LSTM.
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following
function:
\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\ h_t = o_t \odot \tanh(c_t) \\ \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, \(h_{t-1}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and output gates, respectively. \(\sigma\) is the sigmoid function, and \(\odot\) is the Hadamard product.
In a multilayer LSTM, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l >= 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability
dropout
.If
proj_size > 0
is specified, LSTM with projections will be used. This changes the LSTM cell in the following way. First, the dimension of \(h_t\) will be changed fromhidden_size
toproj_size
(dimensions of \(W_{hi}\) will be changed accordingly). Second, the output hidden state of each layer will be multiplied by a learnable projection matrix: \(h_t = W_{hr}h_t\). Note that as a consequence of this, the output of LSTM network will be of different shape as well. See Inputs/Outputs sections below for exact dimensions of all variables. You can find more details in https://arxiv.org/abs/1402.1128.- Parameters
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
batch_first – If
True
, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:False
dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to
dropout
. Default: 0bidirectional – If
True
, becomes a bidirectional LSTM. Default:False
proj_size – If
> 0
, will use LSTM with projections of corresponding size. Default: 0
- Inputs: input, (h_0, c_0)
input: tensor of shape \((L, N, H_{in})\) when
batch_first=False
or \((N, L, H_{in})\) whenbatch_first=True
containing the features of the input sequence.h_0: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.
c_0: tensor of shape \((D * \text{num\_layers}, N, H_{cell})\) containing the initial cell state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{cell} ={} & \text{hidden\_size} \\ H_{out} ={} & \text{proj\_size if } \text{proj\_size}>0 \text{ otherwise hidden\_size} \\ \end{aligned}\end{split}\]- Outputs: output, (h_n, c_n)
output: tensor of shape \((L, N, D * H_{out})\) when
batch_first=False
or \((N, L, D * H_{out})\) whenbatch_first=True
containing the output features (h_t) from the last layer of the LSTM, for each t.h_n: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for each element in the batch.
c_n: tensor of shape \((D * \text{num\_layers}, N, H_{cell})\) containing the final cell state for each element in the batch.
-
weight_ih_l[k]
the learnable input-hidden weights of the \(\text{k}^{th}\) layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
-
weight_hh_l[k]
the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size). If
proj_size > 0
was specified, the shape will be (4*hidden_size, proj_size).
-
bias_ih_l[k]
the learnable input-hidden bias of the \(\text{k}^{th}\) layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
-
bias_hh_l[k]
the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
-
weight_hr_l[k]
the learnable projection weights of the \(\text{k}^{th}\) layer of shape (proj_size, hidden_size). Only present when
proj_size > 0
was specified.
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional LSTMs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False
:output.view(seq_len, batch, num_directions, hidden_size)
.For example:
>>> import oneflow as flow >>> import numpy as np >>> rnn = flow.nn.LSTM(10, 20, 2) >>> input = flow.tensor(np.random.randn(5, 3, 10), dtype=flow.float32) >>> h0 = flow.tensor(np.random.randn(2, 3, 20), dtype=flow.float32) >>> c0 = flow.tensor(np.random.randn(2, 3, 20), dtype=flow.float32) >>> output, (hn, cn) = rnn(input, (h0, c0)) >>> output.size() oneflow.Size([5, 3, 20])
-
check_forward_args
(input: oneflow.Tensor, hidden: Tuple[oneflow.Tensor, oneflow.Tensor], batch_sizes: Optional[oneflow.Tensor])¶
-
get_expected_cell_size
(input: oneflow.Tensor, batch_sizes: Optional[oneflow.Tensor]) → Tuple[int, int, int]¶
-
class
oneflow.nn.
LSTMCell
(input_size: int, hidden_size: int, bias: bool = True, device=None, dtype=None)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/stable/generated/torch.nn.LSTMCell.html.
A long short-term memory (LSTM) cell.
\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.
- Parameters
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
- Inputs: input, (h_0, c_0)
input of shape (batch, input_size) or (input_size): tensor containing input features
h_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial hidden state
c_0 of shape (batch, hidden_size) or (hidden_size): tensor containing the initial cell state
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
- Outputs: (h_1, c_1)
h_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next hidden state
c_1 of shape (batch, hidden_size) or (hidden_size): tensor containing the next cell state
-
weight_ih
¶ the learnable input-hidden weights, of shape (4*hidden_size, input_size)
-
weight_hh
¶ the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size)
-
bias_ih
¶ the learnable input-hidden bias, of shape (4*hidden_size)
-
bias_hh
¶ the learnable hidden-hidden bias, of shape (4*hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
For example:
>>> import oneflow as flow >>> import oneflow.nn as nn >>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size) >>> input = flow.randn(2, 3, 10) # (time_steps, batch, input_size) >>> hx = flow.randn(3, 20) # (batch, hidden_size) >>> cx = flow.randn(3, 20) >>> hx, cx = rnn(input[0], (hx, cx)) >>> hx.size() oneflow.Size([3, 20])
-
class
oneflow.nn.
LayerNorm
(normalized_shape: Union[int, Tuple[int], oneflow.Size], eps: float = 1e-05, elementwise_affine: bool = True)¶ Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by
normalized_shape
. \(\gamma\) and \(\beta\) are learnable affine transform parameters ofnormalized_shape
ifelementwise_affine
isTrue
. The standard-deviation is calculated via the biased estimator.Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
normalized_shape (int or list or oneflow.Size) –
input shape from an expected input of size
\[[* \times \text{normalized_shape}[0] \times \text{normalized_shape}[1] \times \ldots \times \text{normalized_shape}[-1]]\]If a single integer is used, it is treated as a singleton list, and this module will
normalize over the last dimension which is expected to be of that specific size.
eps – a value added to the denominator for numerical stability. Default: 1e-5
elementwise_affine – a boolean value that when set to
True
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
- Shape:
Input: \((N, *)\)
Output: \((N, *)\) (same shape as input)
For example:
>>> import numpy as np >>> import oneflow as flow >>> input_arr = np.array( ... [ ... [ ... [[-0.16046895, -1.03667831], [-0.34974465, 0.26505867]], ... [[-1.24111986, -0.53806001], [1.72426331, 0.43572459]], ... ], ... [ ... [[-0.77390957, -0.42610624], [0.16398858, -1.35760343]], ... [[1.07541728, 0.11008703], [0.26361224, -0.48663723]], ... ], ... ], ... dtype=np.float32, ... ) >>> x = flow.Tensor(input_arr) >>> m = flow.nn.LayerNorm(2) >>> y = m(x).numpy() >>> y array([[[[ 0.99997395, -0.99997395], [-0.999947 , 0.999947 ]], [[-0.99995965, 0.9999595 ], [ 0.99998784, -0.99998784]]], [[[-0.9998348 , 0.99983466], [ 0.9999914 , -0.9999914 ]], [[ 0.9999785 , -0.9999785 ], [ 0.9999646 , -0.9999646 ]]]], dtype=float32)
-
elementwise_affine
: bool¶
-
eps
: float¶
-
normalized_shape
: Tuple[int, …]¶
-
class
oneflow.nn.
LeakyReLU
(negative_slope: float = 0.01, inplace: bool = False)¶ Applies the element-wise function:
\[\begin{split}\text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{negative_slope} \times x, & \text{ otherwise } \end{cases}\end{split}\]- Parameters
negative_slope – Controls the angle of the negative slope. Default: 1e-2
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.LeakyReLU(0.1) >>> arr = np.array([0.2, 0.3, 3.0, 4.0]) >>> x = flow.Tensor(arr) >>> out = m(x) >>> out tensor([0.2000, 0.3000, 3.0000, 4.0000], dtype=oneflow.float32)
-
class
oneflow.nn.
Linear
(in_features: int, out_features: int, bias: bool = True)¶ Applies a linear transformation to the incoming data: \(y = xA^T + b\)
- Parameters
in_features (-) – size of each input sample
out_features (-) – size of each output sample
bias (-) – If set to
False
, the layer will not learn an additive bias. Default:True
- Shape:
Input: \((N, *, H_{in})\) where \(*\) means any number of additional dimensions and \(H_{in} = {in\_features}\)
Output: \((N, *, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = {out\_features}\).
- Attr:
weight
: the learnable weights of the module of shape \(({out\_features}, {in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \((k = 1 / {in\_features})\)bias
: the learnable bias of the module of shape \(({out\_features})\). Ifbias
isTrue
, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \((k = 1 / {in\_features})\)
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.Linear(20, 30, False) >>> input = flow.Tensor(np.random.randn(128, 20)) >>> output = m(input) >>> output.size() oneflow.Size([128, 30])
-
class
oneflow.nn.
LogSigmoid
¶ Applies the element-wise function:
\[\text{LogSigmoid}(x) = \log\left(\frac{ 1 }{ 1 + \exp(-x)}\right)\]- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> logsigmoid = flow.nn.LogSigmoid() >>> out = logsigmoid(input) >>> out tensor([-0.9741, -0.6931, -0.4741], dtype=oneflow.float32)
-
class
oneflow.nn.
LogSoftmax
(dim: Optional[int] = None)¶ Applies the LogSoftmax function to an n-dimensional input Tensor. The LogSoftmax formulation can be simplified as:
\[\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right) = x_i - \log({ \sum_j \exp(x_j)})\]- Parameters
dim (int) – A dimension along which LogSoftmax will be computed.
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.LogSoftmax(dim=1) >>> x = flow.Tensor( ... np.array( ... [[ 0.4296, -1.1957, 2.5463], ... [ 1.2552, -1.5747, 0.6923]] ... ) ... ) >>> out = m(x) >>> out tensor([[-2.2513, -3.8766, -0.1346], [-0.4877, -3.3176, -1.0506]], dtype=oneflow.float32)
-
class
oneflow.nn.
MSELoss
(reduction: str = 'mean')¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.MSELoss.html.
Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input \(x\) and target \(y\).
The unreduced (i.e. with
reduction
set to'none'
) loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = \left( x_n - y_n \right)^2,\]where \(N\) is the batch size. If
reduction
is not'none'
(default'mean'
), then:\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]\(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.
The mean operation still operates over all the elements, and divides by \(n\).
The division by \(n\) can be avoided if one sets
reduction = 'sum'
.- Parameters
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Shape:
Input: \((N, *)\) where \(*\) means, any number of additional dimensions
Target: \((N, *)\), same shape as the input
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor( ... [[-0.02557137, 0.03101675, 1.37493674], ... [0.25599439, -1.08372561, -0.21006816]], dtype=flow.float32) >>> target = flow.tensor( ... [[-1.53105064, -0.68137555, 0.5931354], ... [-0.49158347, 0.93673637, 0.1324141]], dtype=flow.float32) >>> m = flow.nn.MSELoss(reduction="none") >>> out = m(input, target) >>> out tensor([[2.2665, 0.5075, 0.6112], [0.5589, 4.0823, 0.1173]], dtype=oneflow.float32) >>> m = flow.nn.MSELoss(reduction="mean") >>> out = m(input, target) >>> out tensor(1.3573, dtype=oneflow.float32) >>> m = flow.nn.MSELoss(reduction="sum") >>> out = m(input, target) >>> out tensor(8.1436, dtype=oneflow.float32)
-
class
oneflow.nn.
MarginRankingLoss
(margin: float = 0.0, reduction: str = 'mean')¶ Creates a criterion that measures the loss given inputs \(x1\), \(x2\), two 1D mini-batch Tensors, and a label 1D mini-batch tensor \(y\) (containing 1 or -1).
If \(y = 1\) then it assumed the first input should be ranked higher (have a larger value) than the second input, and vice-versa for \(y = -1\).
The loss function for each sample in the mini-batch is:
\[\text{loss}(x1, x2, y) = \max(0, -y * (x1 - x2) + \text{margin})\]- Parameters
margin (float, optional) – Has a default value of \(0\).
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Shape:
x1 : \((N, D)\) where N is the batch size and D is the size of a sample.
x2 : \((N, D)\) where N is the batch size and D is the size of a sample.
Target: \((N)\)
Output: scalar. If
reduction
is'none'
, then \((N)\).
For example:
>>> import oneflow as flow >>> import numpy as np >>> x1 = flow.tensor(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), dtype=flow.float32) >>> x2 = flow.tensor(np.array([[2, 2, 2], [2, 2, 2], [2, 2, 2]]), dtype=flow.float32) >>> target = flow.tensor(np.array([[1, -1, 1],[-1, 1, -1], [1, 1, 1]]), dtype=flow.float32) >>> m = flow.nn.MarginRankingLoss(margin =1.0, reduction="none") >>> out = m(x1, x2, target) >>> out tensor([[2., 1., 0.], [3., 0., 5.], [0., 0., 0.]], dtype=oneflow.float32) >>> m = flow.nn.MarginRankingLoss(margin = 0.3, reduction="sum") >>> out = m(x1, x2, target) >>> out tensor(8.2000, dtype=oneflow.float32) >>> m = flow.nn.MarginRankingLoss(margin = 10, reduction="mean") >>> out = m(x1, x2, target) >>> out tensor(8.3333, dtype=oneflow.float32)
-
class
oneflow.nn.
MaxPool1d
(kernel_size: Union[int, Tuple[int]], stride: Optional[Union[int, Tuple[int]]] = None, padding: Union[int, Tuple[int]] = 0, dilation: Union[int, Tuple[int]] = 1, return_indices: bool = False, ceil_mode: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.MaxPool1d.html.
Applies a 1D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, L)\) and output \((N, C, L_{out})\) can be precisely described as:
\[out(N_i, C_j, k) = \max_{m=0, \ldots, \text{kernel\_size} - 1} input(N_i, C_j, stride \times k + m)\]If
padding
is non-zero, then the input is implicitly padded with minimum value on both sides forpadding
number of points.dilation
is the stride between the elements within the sliding window. This link has a nice visualization of the pooling parameters.Note
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
- Parameters
kernel_size – The size of the sliding window, must be > 0.
stride – The stride of the sliding window, must be > 0. Default value is
kernel_size
.padding – Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
dilation – The stride between elements within a sliding window, must be > 0.
return_indices – If
True
, will return the argmax along with the max values. Useful fortorch.nn.MaxUnpool1d
laterceil_mode – If
True
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.
- Shape:
Input: \((N, C, L_{in})\)
Output: \((N, C, L_{out})\), where
\[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
For example:
import oneflow as flow import numpy as np of_maxpool1d = flow.nn.MaxPool1d(kernel_size=3, padding=1, stride=1) x = flow.Tensor(np.random.randn(1, 4, 4)) y = of_maxpool1d(x) y.shape oneflow.Size([1, 4, 4])
-
class
oneflow.nn.
MaxPool2d
(kernel_size: Union[int, Tuple[int, int]], stride: Optional[Union[int, Tuple[int, int]]] = None, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, return_indices: bool = False, ceil_mode: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.MaxPool2d.html.
Applies a 2D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and
kernel_size
\((kH, kW)\) can be precisely described as:\[\begin{split}\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned}\end{split}\]If
padding
is non-zero, then the input is implicitly minimum value padded on both sides forpadding
number of points.dilation
controls the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of whatdilation
does.Note
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
- The parameters
kernel_size
,stride
,padding
,dilation
can either be: a single
int
– in which case the same value is used for the height and width dimensiona
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
- Parameters
kernel_size – the size of the window to take a max over
stride – the stride of the window. Default value is
kernel_size
padding – implicit minimum value padding to be added on both sides
dilation – a parameter that controls the stride of elements in the window
return_indices – if
True
, will return the max indices along with the outputs. Useful fortorch.nn.MaxUnpool2d
laterceil_mode – when True, will use ceil instead of floor to compute the output shape
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor\]
For example:
import oneflow as flow import numpy as np m = flow.nn.MaxPool2d(kernel_size=3, padding=1, stride=1) x = flow.Tensor(np.random.randn(1, 4, 4, 4)) y = m(x) y.shape oneflow.Size([1, 4, 4, 4])
- The parameters
-
class
oneflow.nn.
MaxPool3d
(kernel_size: Union[int, Tuple[int, int, int]], stride: Optional[Union[int, Tuple[int, int, int]]] = None, padding: Union[int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, return_indices: bool = False, ceil_mode: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.MaxPool3d.html.
Applies a 3D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and
kernel_size
\((kD, kH, kW)\) can be precisely described as:\[\begin{split}\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \max_{k=0, \ldots, kD-1} \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times d + k, \text{stride[1]} \times h + m, \text{stride[2]} \times w + n) \end{aligned}\end{split}\]If
padding
is non-zero, then the input is implicitly minimum value on both sides forpadding
number of points.dilation
controls the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of whatdilation
does.Note
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.
The parameters
kernel_size
,stride
,padding
,dilation
can either be:a single
int
– in which case the same value is used for the depth, height and width dimensiona
tuple
of three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
- Parameters
kernel_size – the size of the window to take a max over
stride – the stride of the window. Default value is
kernel_size
padding – implicit minimum value padding to be added on all three sides
dilation – a parameter that controls the stride of elements in the window
return_indices – if
True
, will return the max indices along with the outputs. Useful fortorch.nn.MaxUnpool3d
laterceil_mode – when True, will use ceil instead of floor to compute the output shape
- Shape:
Input: \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]
For example:
import oneflow as flow import numpy as np of_maxpool3d = flow.nn.MaxPool3d(kernel_size=3, padding=1, stride=1) x = flow.Tensor(np.random.randn(1, 4, 4, 4, 4)) y = of_maxpool3d(x) y.shape oneflow.Size([1, 4, 4, 4, 4])
-
class
oneflow.nn.
MinMaxObserver
(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric', per_layer_quantization: bool = True)¶ Compute the quantization parameters of the input tensor.
First compute the max and min values of input tensor:
\[ \begin{align}\begin{aligned}& max\_value = max(input)\\& min\_value = min(input)\end{aligned}\end{align} \]Then compute the scale and zero_point with the following equations:
if quantization_scheme == “symmetric”:
\[ \begin{align}\begin{aligned}& denom = 2^{quantization\_to\_bit - 1} - 1\\& scale = max(|max\_value|,|min\_value|) / denom\\& zero\_point = 0\end{aligned}\end{align} \]elif quantization_scheme == “affine”:
\[ \begin{align}\begin{aligned}& denom = 2^{quantization\_to\_bit} - 1\\& scale = (max\_value - min\_value) / denom\\& zero\_point = -min\_value / scale\end{aligned}\end{align} \]If per_layer_quantization is False, then the shape of scale and zero_point will be (input.shape[0],).
- Parameters
input (oneflow.Tensor) – the input value(s), in
oneflow.float32
.quantization_formula (str) – Support “google” or “cambricon”.
quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.
quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.
per_layer_quantization (bool) – True or False, means per-layer / per-channel quantization. Defaults to True.
- Returns
The scale and zero_point of input tensor.
- Return type
Tuple[oneflow.Tensor, oneflow.Tensor]
For example:
>>> import numpy as np >>> import oneflow as flow >>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32) >>> input_tensor = flow.tensor( ... weight, dtype=flow.float32 ... ) >>> quantization_bit = 8 >>> quantization_scheme = "symmetric" >>> quantization_formula = "google" >>> per_layer_quantization = True >>> min_max_observer = flow.nn.MinMaxObserver(quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme, per_layer_quantization=per_layer_quantization) >>> scale, zero_point = min_max_observer( ... input_tensor, )
-
class
oneflow.nn.
Mish
(inplace: bool = False)¶ Applies the element-wise function:
\[\text{Mish}(x) = x * \text{Tanh}(\text{Softplus}(x))\]- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([1, 2, 3]).astype(np.float32) >>> input = flow.Tensor(x) >>> mish = flow.nn.Mish() >>> out = mish(input) >>> out tensor([0.8651, 1.9440, 2.9865], dtype=oneflow.float32)
-
class
oneflow.nn.
ModuleDict
(modules: Optional[Mapping[str, oneflow.nn.module.Module]] = None)¶ Holds submodules in a dictionary.
The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ModuleDict.html?#torch.nn.ModuleDict.
ModuleDict
can be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by allModule
methods.ModuleDict
is an ordered dictionary that respectsthe order of insertion, and
in
update()
, the order of the mergedOrderedDict
,dict
(started from Python 3.6) or anotherModuleDict
(the argument toupdate()
).
Note that
update()
with other unordered mapping types (e.g., Python’s plaindict
before Python version 3.6) does not preserve the order of the merged mapping.- Parameters
modules (iterable, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module)
>>> import oneflow.nn as nn >>> class MyModule(nn.Module): ... def __init__(self): ... super(MyModule, self).__init__() ... self.choices = nn.ModuleDict({ ... 'conv': nn.Conv2d(10, 10, 3), ... 'pool': nn.MaxPool2d(3) ... }) ... self.activations = nn.ModuleDict([ ... ['lrelu', nn.LeakyReLU()], ... ['prelu', nn.PReLU()] ... ]) ... def forward(self, x, choice, act): ... x = self.choices[choice](x) ... x = self.activations[act](x) ... return x >>> model = MyModule() >>> model.choices ModuleDict( (conv): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) (pool): MaxPool2d(kernel_size=(3, 3), stride=(3, 3), padding=(0, 0), dilation=(1, 1)) )
-
class
oneflow.nn.
ModuleList
(modules: Optional[Iterable[oneflow.nn.module.Module]] = None)¶ Holds submodules in a list.
The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ModuleList.html?#torch.nn.ModuleList.
ModuleList
can be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by allModule
methods.- Parameters
modules (iterable, optional) – an iterable of modules to add
>>> import oneflow.nn as nn >>> class MyModule(nn.Module): ... def __init__(self): ... super(MyModule, self).__init__() ... self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)]) ... def forward(self, x): ... # ModuleList can act as an iterable, or be indexed using ints ... for i, l in enumerate(self.linears): ... x = self.linears[i // 2](x) + l(x) ... return x >>> model = MyModule() >>> model.linears ModuleList( (0): Linear(in_features=10, out_features=10, bias=True) (1): Linear(in_features=10, out_features=10, bias=True) (2): Linear(in_features=10, out_features=10, bias=True) (3): Linear(in_features=10, out_features=10, bias=True) (4): Linear(in_features=10, out_features=10, bias=True) (5): Linear(in_features=10, out_features=10, bias=True) (6): Linear(in_features=10, out_features=10, bias=True) (7): Linear(in_features=10, out_features=10, bias=True) (8): Linear(in_features=10, out_features=10, bias=True) (9): Linear(in_features=10, out_features=10, bias=True) )
-
class
oneflow.nn.
MovingAverageMinMaxObserver
(training: bool = False, stop_update_after_iters: int = 0, quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric', momentum: float = 0)¶ Compute the quantization parameters based on the moving average of the input tensor’s min and max values.
First compute the moving_max and moving_min value of input tensor:
if quantization_scheme == “symmetric”:
\[ \begin{align}\begin{aligned}& moving\_max = moving\_max * momentum + |max(input)| * (1 - momentum)\\& moving\_min = moving\_max\end{aligned}\end{align} \]elif quantization_scheme == “affine”:
\[ \begin{align}\begin{aligned}& moving\_max = moving\_max * momentum + max(input) * (1 - momentum)\\& moving\_min = moving\_min * momentum + min(input) * (1 - momentum)\end{aligned}\end{align} \]The moving average of min and max values are initialized as the first batch of input Blob’s min and max.
Then compute the scale and zero_point with the following equations:
if quantization_scheme == “symmetric”:
\[ \begin{align}\begin{aligned}& denom = 2^{quantization\_to\_bit - 1} - 1\\& scale = moving\_max / denom\\& zero\_point = 0\end{aligned}\end{align} \]elif quantization_scheme == “affine”:
\[ \begin{align}\begin{aligned}& denom = 2^{quantization\_to\_bit} - 1\\& scale = (moving\_max - moving\_min) / denom\\& zero\_point = -moving\_min / scale\end{aligned}\end{align} \]Note
current_train_step
can be directly assigned to an optimizer(eg.SGD) step.- Parameters
input (oneflow.Tensor) – the input value(s), in
oneflow.float32
.current_train_step_tensor (oneflow.Tensor) – record train step for quantionzation aware training.
stop_update_after_iters (int) – stop record train step for quantionzation aware training when train iter greater than stop_update_after_iters.
training (bool) – Is the model in training state. Defaults to False.
quantization_formula (str) – Support “google” or “cambricon”.
quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.
quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.
momentum (float) – Smoothing parameter for exponential moving average operation. Defaults to 0.95.
- Returns
The scale and zero_point of input tensor.
- Return type
Tuple[oneflow.Tensor, oneflow.Tensor]
For example:
>>> import numpy as np >>> import oneflow as flow >>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32) >>> input_tensor = flow.tensor( ... weight, dtype=flow.float32 ... ) >>> current_train_step_tensor = flow.tensor( ... np.zeros((1,)).astype(np.float32), ... dtype=flow.int64, ... ) >>> momentum = 0.95 >>> quantization_bit = 8 >>> quantization_scheme = "symmetric" >>> quantization_formula = "google" >>> moving_average_min_max_observer = flow.nn.MovingAverageMinMaxObserver(training=True, stop_update_after_iters=1, ... quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme, momentum=momentum, ... ) >>> (scale, zero_point) = moving_average_min_max_observer( ... input_tensor, ... current_train_step_tensor, ... )
-
reset_running_stats
() → None¶
-
class
oneflow.nn.
NLLLoss
(weight: Optional[oneflow.Tensor] = None, ignore_index: int = - 100, reduction: str = 'mean')¶ The negative log likelihood loss. It is useful to train a classification problem with C classes.
The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case (described later).
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.
The target that this loss expects should be a class index in the range \([0, C-1]\) where C = number of classes;
The unreduced (i.e. with
reduction
set to'none'
) loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} = \mathbb{1},\]where \(x\) is the input, \(y\) is the target, \(w\) is the weight, and \(N\) is the batch size. If
reduction
is not'none'
(default'mean'
), then\[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{N} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Can also be used for higher dimension inputs, such as 2D images, by providing an input of size \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\), where \(K\) is the number of dimensions, and a target of appropriate shape (see below). In the case of images, it computes NLL loss per-pixel.
- Parameters
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the weighted mean of the output is taken,'sum'
: the output will be summed. Default:'mean'
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor( ... [[-0.1664078, -1.7256707, -0.14690138], ... [-0.21474946, 0.53737473, 0.99684894], ... [-1.135804, -0.50371903, 0.7645404]], dtype=flow.float32) >>> target = flow.tensor(np.array([0, 1, 2]), dtype=flow.int32) >>> m = flow.nn.NLLLoss(reduction="none") >>> out = m(input, target) >>> out tensor([ 0.1664, -0.5374, -0.7645], dtype=oneflow.float32) >>> m = flow.nn.NLLLoss(reduction="sum") >>> out = m(input, target) >>> out tensor(-1.1355, dtype=oneflow.float32) >>> m = flow.nn.NLLLoss(reduction="mean") >>> out = m(input, target) >>> out tensor(-0.3785, dtype=oneflow.float32)
-
class
oneflow.nn.
OFRecordBytesDecoder
(blob_name: str, name: Optional[str] = None)¶ This operator reads an tensor as bytes. The output might need
further decoding process like cv2.imdecode() for images and decode(“utf-8”)
for characters,depending on the downstream task.
- Parameters
blob_name – The name of the target feature in OFRecord.
name – The name for this component in the graph.
input – the Tensor which might be provided by an OFRecordReader.
- Returns
The result Tensor encoded with bytes.
For example:
>>> import numpy as np >>> import oneflow as flow >>> def example(): ... batch_size = 16 ... record_reader = flow.nn.OFRecordReader( ... "dataset/", ... batch_size=batch_size, ... part_name_suffix_length=5, ... ) ... val_record = record_reader() ... bytesdecoder_img = flow.nn.OFRecordBytesDecoder("encoded") ... image_bytes_batch = bytesdecoder_img(val_record) ... image_bytes = image_bytes_batch.numpy()[0] ... return image_bytes ... example() array([255 216 255 ... 79 255 217], dtype=uint8)
-
class
oneflow.nn.
OFRecordImageDecoder
(blob_name: str, color_space: str = 'BGR')¶
-
class
oneflow.nn.
OFRecordImageDecoderRandomCrop
(blob_name: str, color_space: str = 'BGR', num_attempts: int = 10, random_seed: Optional[int] = None, random_area: Sequence[float] = [0.08, 1.0], random_aspect_ratio: Sequence[float] = [0.75, 1.333333])¶
-
class
oneflow.nn.
OFRecordRawDecoder
(blob_name: str, shape: Sequence[int], dtype: oneflow._oneflow_internal.dtype, dim1_varying_length: bool = False, truncate: bool = False, auto_zero_padding: bool = False, name: Optional[str] = None)¶
-
class
oneflow.nn.
OFRecordReader
(ofrecord_dir: str, batch_size: int = 1, data_part_num: int = 1, part_name_prefix: str = 'part-', part_name_suffix_length: int = - 1, random_shuffle: bool = False, shuffle_buffer_size: int = 1024, shuffle_after_epoch: bool = False, random_seed: int = - 1, device: Optional[Union[oneflow._oneflow_internal.device, str]] = None, placement: Optional[oneflow._oneflow_internal.placement] = None, sbp: Optional[Union[oneflow._oneflow_internal.sbp.sbp, List[oneflow._oneflow_internal.sbp.sbp]]] = None, name: Optional[str] = None)¶
-
class
oneflow.nn.
PReLU
(num_parameters: int = 1, init: float = 0.25, device=None, dtype=None)¶ Applies the element-wise function:
\[PReLU(x) = \max(0,x) + a * \min(0,x)\]Here \(a\) is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter \(a\) across all input channels. If called with nn.PReLU(nChannels), a separate \(a\) is used for each input channel.
Note
weight decay should not be used when learning \(a\) for good performance.
Note
Channel dim is the 2nd dim of input. When input has dims < 2, then there is no channel dim and the number of channels = 1.
- Parameters
num_parameters (int) – number of \(a\) to learn. Although it takes an int as input, there is only two values are legitimate: 1, or the number of channels at input. Default: 1
init (float) – the initial value of \(a\). Default: 0.25
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
- Attr:
weight (Tensor): the learnable weights of shape (
num_parameters
).
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.PReLU() >>> input = flow.tensor(np.asarray([[[[1, -2], [3, 4]]]]), dtype=flow.float32) >>> print(m(input).numpy()) [[[[ 1. -0.5] [ 3. 4. ]]]]
-
class
oneflow.nn.
Parameter
¶
-
class
oneflow.nn.
ParameterDict
(parameters=None)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ParameterDict.html?#torch.nn.ParameterDict.
Holds parameters in a dictionary.
ParameterDict can be indexed like a regular Python dictionary, but parameters it contains are properly registered, and will be visible by all Module methods.
ParameterDict
is an ordered dictionary that respectsthe order of insertion, and
in
update()
, the order of the mergedOrderedDict
or anotherParameterDict
(the argument toupdate()
).
Note that
update()
with other unordered mapping types (e.g., Python’s plaindict
) does not preserve the order of the merged mapping.- Parameters
parameters (iterable, optional) – a mapping (dictionary) of (string :
Parameter
) or an iterable of key-value pairs of type (string,Parameter
)
>>> import oneflow as flow >>> import oneflow.nn as nn >>> class MyModule(nn.Module): ... def __init__(self): ... super(MyModule, self).__init__() ... self.params = nn.ParameterDict({ ... 'left': nn.Parameter(flow.randn(5, 10)), ... 'right': nn.Parameter(flow.randn(5, 10)) ... }) ... ... def forward(self, x, choice): ... x = self.params[choice].mm(x) ... return x >>> model = MyModule() >>> model.params ParameterDict( (left): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 5x10] (right): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 5x10] )
-
class
oneflow.nn.
ParameterList
(parameters=None)¶ Holds parameters in a list.
The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ParameterList.html?#torch.nn.ParameterList.
ParameterList
can be indexed like a regular Python list, but parameters it contains are properly registered, and will be visible by allModule
methods.- Parameters
parameters (iterable, optional) – an iterable of
Parameter
to add
>>> import oneflow as flow >>> import oneflow.nn as nn >>> class MyModule(nn.Module): ... def __init__(self): ... super(MyModule, self).__init__() ... self.params = nn.ParameterList([nn.Parameter(flow.randn(10, 10)) for i in range(10)]) ... ... def forward(self, x): ... # ParameterList can act as an iterable, or be indexed using ints ... for i, p in enumerate(self.params): ... x = self.params[i // 2].mm(x) + p.mm(x) ... return x >>> model = MyModule() >>> model.params ParameterList( (0): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (1): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (2): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (3): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (4): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (5): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (6): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (7): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (8): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] (9): Parameter containing: [<class 'oneflow.nn.Parameter'> of size 10x10] )
-
oneflow.nn.
PixelShuffle
¶
-
class
oneflow.nn.
Quantization
(quantization_formula: str = 'google', quantization_bit: int = 8, quantization_scheme: str = 'symmetric')¶ Simulate the quantize operation in inference time.
The output will be computed as:
if quantization_scheme == “symmetric”:
\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit - 1} - 1\\& quant\_min = -quant\_max\\& clamp(round(x / scale), quant\_min, quant\_max)\end{aligned}\end{align} \]elif quantization_scheme == “affine”:
\[ \begin{align}\begin{aligned}& quant\_max = 2^{quantization\_to\_bit} - 1\\& quant\_min = 0\\& (clamp(round(x / scale + zero\_point), quant\_min, quant\_max) - zero\_point)\end{aligned}\end{align} \]- Parameters
quantization_bit (int) – Quantize input to uintX / intX, X can be in range [2, 8]. Defaults to 8.
quantization_scheme (str) – “symmetric” or “affine”, quantize to signed / unsigned integer. Defaults to “symmetric”.
quantization_formula (str) – Support “google” or “cambricon”.
- Returns
Input tensor after quantize operation.
- Return type
For example:
>>> import numpy as np >>> import oneflow as flow >>> weight = (np.random.random((2, 3, 4, 5)) - 0.5).astype(np.float32) >>> input_tensor = flow.tensor( ... weight, dtype=flow.float32 ... ) >>> quantization_bit = 8 >>> quantization_scheme = "symmetric" >>> quantization_formula = "google" >>> per_layer_quantization = True >>> min_max_observer = flow.nn.MinMaxObserver(quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme, per_layer_quantization=per_layer_quantization) >>> quantization = flow.nn.Quantization(quantization_formula=quantization_formula, quantization_bit=quantization_bit, ... quantization_scheme=quantization_scheme) >>> scale, zero_point = min_max_observer( ... input_tensor, ... ) >>> output_tensor = quantization( ... input_tensor, ... scale, ... zero_point, ... )
-
class
oneflow.nn.
RNN
(*args, **kwargs)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.RNN.html.
Applies a multi-layer Elman RNN with tanhtanh or text{ReLU}ReLU non-linearity to an input sequence.
For each element in the input sequence, each layer computes the following function:
function:
\[h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})\]where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If
nonlinearity
is'relu'
, then \(\text{ReLU}\) is used instead of \(\tanh\).- Parameters
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1nonlinearity – The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
batch_first – If
True
, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default:False
dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to
dropout
. Default: 0bidirectional – If
True
, becomes a bidirectional RNN. Default:False
- Inputs: input, h_0
input: tensor of shape \((L, N, H_{in})\) when
batch_first=False
or \((N, L, H_{in})\) whenbatch_first=True
containing the features of the input sequence.h_0: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the initial hidden state for each element in the batch. Defaults to zeros if not provided.
where:
\[\begin{split}\begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{out} ={} & \text{hidden\_size} \end{aligned}\end{split}\]- Outputs: output, h_n
output: tensor of shape \((L, N, D * H_{out})\) when
batch_first=False
or \((N, L, D * H_{out})\) whenbatch_first=True
containing the output features (h_t) from the last layer of the RNN, for each t.h_n: tensor of shape \((D * \text{num\_layers}, N, H_{out})\) containing the final hidden state for each element in the batch.
-
weight_ih_l[k]
the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
-
weight_hh_l[k]
the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)
-
bias_ih_l[k]
the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
-
bias_hh_l[k]
the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
Note
For bidirectional RNNs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when
batch_first=False
:output.view((seq_len, batch, num_directions, hidden_size))
.For example:
>>> import oneflow as flow >>> import numpy as np >>> rnn = flow.nn.RNN(10, 20, 2) >>> input = flow.tensor(np.random.randn(5, 3, 10), dtype=flow.float32) >>> h0 = flow.tensor(np.random.randn(2, 3, 20), dtype=flow.float32) >>> output, hn = rnn(input, h0) >>> output.size() oneflow.Size([5, 3, 20])
-
class
oneflow.nn.
RNNCell
(input_size: int, hidden_size: int, bias: bool = True, nonlinearity: str = 'tanh', device=None, dtype=None)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html.
An Elman RNN cell with tanh or ReLU non-linearity.
\[h' = \tanh(W_{ih} x + b_{ih} + W_{hh} h + b_{hh})\]If
nonlinearity
is ‘relu’, then ReLU is used in place of tanh.- Parameters
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
bias – If
False
, then the layer does not use bias weights b_ih and b_hh. Default:True
nonlinearity – The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- Inputs: input, hidden
input: tensor containing input features
hidden: tensor containing the initial hidden state Defaults to zero if not provided.
- Outputs: h’
h’ of shape (batch, hidden_size): tensor containing the next hidden state for each element in the batch
- Shape:
input: \((N, H_{in})\) or \((H_{in})\) tensor containing input features where \(H_{in}\) = input_size.
hidden: \((N, H_{out})\) or \((H_{out})\) tensor containing the initial hidden state where \(H_{out}\) = hidden_size. Defaults to zero if not provided.
output: \((N, H_{out})\) or \((H_{out})\) tensor containing the next hidden state.
-
weight_ih
¶ the learnable input-hidden weights, of shape (hidden_size, input_size)
-
weight_hh
¶ the learnable hidden-hidden weights, of shape (hidden_size, hidden_size)
-
bias_ih
¶ the learnable input-hidden bias, of shape (hidden_size)
-
bias_hh
¶ the learnable hidden-hidden bias, of shape (hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\)
For example:
>>> import oneflow as flow >>> import oneflow.nn as nn >>> rnn = nn.RNNCell(10, 20) >>> input = flow.randn(6, 3, 10) >>> hx = flow.randn(3, 20) >>> hx = rnn(input[0], hx) >>> hx.size() oneflow.Size([3, 20])
-
class
oneflow.nn.
ReLU
(inplace: bool = False)¶ Applies the rectified linear unit function element-wise:
\(\text{ReLU}(x) = (x)^+ = \max(0, x)\)
- Parameters
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import oneflow as flow >>> import numpy as np >>> relu = flow.nn.ReLU() >>> ndarr = np.asarray([1, -2, 3]) >>> x = flow.Tensor(ndarr) >>> relu(x) tensor([1., 0., 3.], dtype=oneflow.float32)
-
class
oneflow.nn.
ReLU6
(inplace: bool = False)¶ Applies the element-wise function:
\[\begin{split}\text{Relu6}(x) = \begin{cases} 6 & \text{ if } x > 6 \\ 0 & \text{ if } x < 0 \\ x & \text{ otherwise } \\ \end{cases}\end{split}\]- Parameters
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> relu6 = flow.nn.ReLU6() >>> out = relu6(input) >>> out tensor([0.0000, 0.0000, 0.5000], dtype=oneflow.float32)
-
class
oneflow.nn.
ReflectionPad2d
(padding: Union[int, Tuple[int, int, int, int]])¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ReflectionPad2d.html.
This operator pads the input tensor using the reflection of the input boundary.
- Parameters
padding (Union[int,tuple]) – The size or bundary of padding, if is int uses the same padding in all dimension; if 4-dims tuple, uses \((\text{padding}_{\text{left}}, \text{padding}_{\text{right}}, \text{padding}_{\text{top}}, \text{padding}_{\text{bottom}} )\)
- Returns
Returns a new tensor which is result of the reflection padding of the input tensor.
- Return type
- Shape:
Input: \((N, C, H_{\text{in}}, W_{\text{in}})\)
Output: \((N, C, H_{\text{out}}, W_{\text{out}})\) where
\(H_{\text{out}} = H_{\text{in}} + \text{padding}_{\text{top}} + \text{padding}_{\text{bottom}}\)
\(W_{\text{out}} = W_{\text{in}} + \text{padding}_{\text{left}} + \text{padding}_{\text{right}}\)
For example:
>>> import oneflow as flow >>> import numpy as np >>> input = flow.tensor(np.arange(18).reshape((1, 2, 3, 3)).astype(np.float32)) >>> m = flow.nn.ReflectionPad2d((2, 2, 1, 1)) >>> out = m(input) >>> out tensor([[[[ 5., 4., 3., 4., 5., 4., 3.], [ 2., 1., 0., 1., 2., 1., 0.], [ 5., 4., 3., 4., 5., 4., 3.], [ 8., 7., 6., 7., 8., 7., 6.], [ 5., 4., 3., 4., 5., 4., 3.]], [[14., 13., 12., 13., 14., 13., 12.], [11., 10., 9., 10., 11., 10., 9.], [14., 13., 12., 13., 14., 13., 12.], [17., 16., 15., 16., 17., 16., 15.], [14., 13., 12., 13., 14., 13., 12.]]]], dtype=oneflow.float32)
-
class
oneflow.nn.
ReplicationPad2d
(padding: Union[int, Tuple[int, int, int, int]])¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ReplicationPad2d.html.
Pads the input tensor using the replication of the input boundary.
- Parameters
padding (Union[int, tuple, list]) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\mathrm{padding_{left}}\), \(\mathrm{padding_{right}}\), \(\mathrm{padding_{top}}\), \(\mathrm{padding_{bottom}}\))
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\(H_{out} = H_{in} + \mathrm{padding_{top}} + \mathrm{padding_{bottom}}\)
\(W_{out} = W_{in} + \mathrm{padding_{left}} + \mathrm{padding_{right}}\)
For example:
>>> import oneflow as flow >>> import numpy as np >>> m = flow.nn.ReplicationPad2d((2, 2, 1, 1)) >>> input = flow.tensor(np.arange(18).reshape((1, 2, 3, 3)).astype(np.float32)) >>> input_int = flow.tensor(np.arange(18).reshape((1, 2, 3, 3)).astype(np.int32)) >>> output = m(input) >>> output.shape oneflow.Size([1, 2, 5, 7]) >>> output tensor([[[[ 0., 0., 0., 1., 2., 2., 2.], [ 0., 0., 0., 1., 2., 2., 2.], [ 3., 3., 3., 4., 5., 5., 5.], [ 6., 6., 6., 7., 8., 8., 8.], [ 6., 6., 6., 7., 8., 8., 8.]], [[ 9., 9., 9., 10., 11., 11., 11.], [ 9., 9., 9., 10., 11., 11., 11.], [12., 12., 12., 13., 14., 14., 14.], [15., 15., 15., 16., 17., 17., 17.], [15., 15., 15., 16., 17., 17., 17.]]]], dtype=oneflow.float32)
-
class
oneflow.nn.
SELU
(inplace: bool = False)¶ Applies the element-wise function:
The formula is:
\[\text{SELU}(x) = \text{scale} * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))\]with \(\alpha = 1.6732632423543772848170429916717\) and
\(\text{scale} = 1.0507009873554804934193349852946\).
Warning
When using
kaiming_normal
orkaiming_normal_
for initialisation,nonlinearity='linear'
should be used instead ofnonlinearity='selu'
in order to get Self-Normalizing Neural Networks. Seetorch.nn.init.calculate_gain()
for more information.More details can be found in the paper Self-Normalizing Neural Networks.
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([1, 2, 3]).astype(np.float32) >>> input = flow.Tensor(x) >>> selu = flow.nn.SELU() >>> out = selu(input) >>> out tensor([1.0507, 2.1014, 3.1521], dtype=oneflow.float32)
-
class
oneflow.nn.
Sequential
(*args: Any)¶ A sequential container.
The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Sequential.html?#torch.nn.Sequential.
Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.
To make it easier to understand, here is a small example:
>>> import oneflow.nn as nn >>> from collections import OrderedDict >>> nn.Sequential(nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU()) Sequential( (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (1): ReLU() (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1)) (3): ReLU() ) >>> nn.Sequential(OrderedDict([ ... ('conv1', nn.Conv2d(1,20,5)), ... ('relu1', nn.ReLU()), ... ('conv2', nn.Conv2d(20,64,5)), ... ('relu2', nn.ReLU()) ... ])) Sequential( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (relu1): ReLU() (conv2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1)) (relu2): ReLU() )
-
class
oneflow.nn.
SiLU
(inplace: bool = False)¶ SiLU(Swish) activation:
\[\text{SiLU}(x) = x * sigmoid(x)\]Note
See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the SiLU was experimented with later.
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([1, 2, 3]).astype(np.float32) >>> input = flow.Tensor(x) >>> silu = flow.nn.SiLU() >>> out = silu(input) >>> out tensor([0.7311, 1.7616, 2.8577], dtype=oneflow.float32)
-
class
oneflow.nn.
Sigmoid
¶ Applies the element-wise function:
\[\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + \exp(-x)}\]- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = flow.Tensor(np.array([0.81733328, 0.43621480, 0.10351428])) >>> m = flow.nn.Sigmoid() >>> out = m(x) >>> out tensor([0.6937, 0.6074, 0.5259], dtype=oneflow.float32)
-
class
oneflow.nn.
SmoothL1Loss
(reduction: str = 'mean', beta: float = 1.0)¶ Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.SmoothL1Loss.html.
It is less sensitive to outliers than
torch.nn.MSELoss
and in some cases prevents exploding gradients (e.g. see the paper Fast R-CNN by Ross Girshick)..For a batch of size \(N\), the unreduced loss can be described as:
\[\ell(x, y) = L = \{l_1, ..., l_N\}^T\]with
\[\begin{split}l_n = \begin{cases} 0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\ |x_n - y_n| - 0.5 * beta, & \text{otherwise } \end{cases}\end{split}\]If reduction is not none, then:
\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Note
Smooth L1 loss can be seen as exactly
L1Loss
, but with the \(|x - y| < beta\) portion replaced with a quadratic function such that its slope is 1 at \(|x - y| = beta\). The quadratic segment smooths the L1 loss near \(|x - y| = 0\).Note
Smooth L1 loss is closely related to
HuberLoss
, being equivalent to \(huber(x, y) / beta\) (note that Smooth L1’s beta hyper-parameter is also known as delta for Huber). This leads to the following differences:As beta -> 0, Smooth L1 loss converges to
L1Loss
, whileHuberLoss
converges to a constant 0 loss.As beta -> \(+\infty\), Smooth L1 loss converges to a constant 0 loss, while
HuberLoss
converges toMSELoss
.For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1. For
HuberLoss
, the slope of the L1 segment is beta.
- Parameters
size_average (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the fieldsize_average
is set toFalse
, the losses are instead summed for each minibatch. Ignored whenreduce
isFalse
. Default:True
reduce (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged or summed over observations for each minibatch depending onsize_average
. Whenreduce
isFalse
, returns a loss per batch element instead and ignoressize_average
. Default:True
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Note:size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction
. Default:'mean'
beta (float, optional) – Specifies the threshold at which to change between L1 and L2 loss. The value must be non-negative. Default: 1.0
- Shape:
Input: \((N, *)\) where \(*\) means any number of additional dimensions
Target: \((N, *)\); same shape as the input
Output: scalar. If
reduction
is'none'
, then \((N, *)\); same shape as the input
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.tensor(np.array([0.1, 0.4, 0.3, 0.5, 0.9]).astype(np.float32), dtype=flow.float32) >>> y = flow.tensor(np.array([0.3, 0.9, 2.5, 0.4, 0.3]).astype(np.float32), dtype=flow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="none") >>> out = m(x, y) >>> out tensor([0.0200, 0.1250, 1.7000, 0.0050, 0.1800], dtype=oneflow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="mean") >>> out = m(x, y) >>> out tensor(0.4060, dtype=oneflow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="sum") >>> out = m(x, y) >>> out tensor(2.0300, dtype=oneflow.float32)
-
class
oneflow.nn.
Softmax
(dim: Optional[int] = None)¶ Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.
Softmax is defined as:
\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]When the input Tensor is a sparse tensor then the unspecifed values are treated as
-inf
.- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Returns
a Tensor of the same dimension and shape as the input with values in the range [0, 1]
- Parameters
dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).
For example:
>>> import numpy as np >>> import oneflow as flow >>> m = flow.nn.Softmax(dim = 2) >>> x = flow.Tensor( ... np.array( ... [[[-0.46716809, 0.40112534, 0.61984003], ... [-1.31244969, -0.42528763, 1.47953856]]] ... ) ... ) >>> out = m(x) >>> out tensor([[[0.1575, 0.3754, 0.4671], [0.0507, 0.1230, 0.8263]]], dtype=oneflow.float32)
-
class
oneflow.nn.
Softplus
(beta: int = 1, threshold: int = 20)¶ Applies the element-wise function:
\[\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))\]SoftPlus is a smooth approximation to the ReLU function and can be used to constrain the output of a machine to always be positive.
For numerical stability the implementation reverts to the linear function when \(input \times \beta > threshold\).
- Parameters
beta – the \(\beta\) value for the Softplus formulation. Default: 1
threshold – values above this revert to a linear function. Default: 20
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-0.5, 0, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> softplus = flow.nn.Softplus() >>> out = softplus(input) >>> out tensor([0.4741, 0.6931, 0.9741], dtype=oneflow.float32)
-
class
oneflow.nn.
Softshrink
(lambd: float = 0.5, inplace: bool = False)¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Softshrink.html.
The Softshrink activation.
The formula is:
\[\begin{split}\text{Softshrink}(x) = \begin{cases} x - \lambd, & \text{ if } x > \lambda \\ x + \lambd, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]- Parameters
lambd – the \(\lambda\) value for the Softshrink formulation. Default: 0.5
inplace – can optionally do the operation in-place. Default:
False
- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-1, 0, 0.2, 0.5]).astype(np.float32) >>> input = flow.Tensor(x) >>> softshrink = flow.nn.Softshrink(lambd=0.5) >>> out = softshrink(input) >>> out tensor([-0.5000, 0.0000, 0.0000, 0.0000], dtype=oneflow.float32)
-
class
oneflow.nn.
Softsign
(inplace: bool = False)¶ The SoftSign activation.
The formula is:
\[SoftSign(x) = \frac{x}{1 + |x|}\]- Shape:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([1, 2, 3]).astype(np.float32) >>> input = flow.Tensor(x) >>> softsign = flow.nn.Softsign() >>> out = softsign(input) >>> out tensor([0.5000, 0.6667, 0.7500], dtype=oneflow.float32)
-
class
oneflow.nn.
Tanh
¶ This operator computes the hyperbolic tangent value of Tensor.
The equation is:
\[out = \frac{e^x-e^{-x}}{e^x+e^{-x}}\]- Parameters
input (oneflow.Tensor) – A Tensor
- Returns
The result Tensor
- Return type
For example:
>>> import numpy as np >>> import oneflow as flow >>> x = np.array([-1, 0, 1]).astype(np.float32) >>> input = flow.Tensor(x) >>> tanh = flow.nn.Tanh() >>> out = tanh(input) >>> out tensor([-0.7616, 0.0000, 0.7616], dtype=oneflow.float32)
-
class
oneflow.nn.
Threshold
(threshold: float, value: float)¶ The Threshold Activation. Return
x
ifx
is greater thanthreshold
, else returnvalue
.The interface is consistent with PyTorch. The documentation is referenced from https://pytorch.org/docs/1.10/generated/torch.nn.Threshold.html.
The formula is:
\[\begin{split}\text{Threshold}(x) = \begin{cases} x, & \text{ if } x > \text{ threshold } \\ \text{value }, & \text{ otherwise } \end{cases}\end{split}\]- Parameters
threshold (float) – The
threshold
value for the Threshold formulationvalue (float) – The
value
value for the Threshold formulation
- Shapes:
Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input
- Returns
The result tensor
- Return type
Oneflow.Tensor
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = np.array([-1, 0, 0.5, 1]).astype(np.float32) >>> input = flow.Tensor(x) >>> th = flow.nn.Threshold(threshold=0.5, value=0.2) >>> out = th(input) >>> out tensor([0.2000, 0.2000, 0.2000, 1.0000], dtype=oneflow.float32)
-
class
oneflow.nn.
TripletMarginLoss
(margin: float = 1.0, p: float = 2.0, eps: float = 1e-06, swap: bool = False, size_average=None, reduce=None, reduction: str = 'mean')¶ Creates a criterion that measures the triplet loss given an input tensors \(x1\), \(x2\), \(x3\) and a margin with a value greater than \(0\). This is used for measuring a relative similarity between samples. A triplet is composed by a, p and n (i.e., anchor, positive examples and negative examples respectively). The shapes of all input tensors should be \((N, D)\).
The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.
The loss function for each sample in the mini-batch is:
\[L(a, p, n) = \max \{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\}\]where
\[d(x_i, y_i) = \left\lVert {\bf x}_i - {\bf y}_i \right\rVert_p\]- Parameters
margin (float, optional) – Default: \(1\).
p (float, optional) – The norm degree for pairwise distance. Default: \(2.0\).
swap (bool, optional) – The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default:
False
.reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Note:size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction
. Default:'mean'
- Shape:
Input: \((N, D)\) where \(D\) is the vector dimension.
Output: A Tensor of shape \((N)\) if
reduction
is'none'
, or a scalar otherwise.
For example:
>>> import oneflow as flow >>> import numpy as np >>> triplet_loss = flow.nn.TripletMarginLoss(margin=1.0, p=2) >>> anchor = np.array([[1, -1, 1],[-1, 1, -1], [1, 1, 1]]) >>> positive = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> negative = np.array([[2, 2, 2], [2, 2, 2], [2, 2, 2]]) >>> output = triplet_loss(flow.Tensor(anchor), flow.Tensor(positive), flow.Tensor(negative)) >>> output tensor(6.2971, dtype=oneflow.float32)
-
class
oneflow.nn.
Unfold
(kernel_size: Union[int, Tuple[int, int]], dilation: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, stride: Union[int, Tuple[int, int]] = 1)¶
-
class
oneflow.nn.
Upsample
(size: Optional[Union[int, Tuple[int, …]]] = None, scale_factor: Optional[Union[float, Tuple[float, …]]] = None, mode: str = 'nearest', align_corners: Optional[bool] = None)¶ The interface is consistent with PyTorch.
The documentation is referenced from: https://pytorch.org/docs/1.10/_modules/torch/nn/modules/upsampling.html.
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.
The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.
The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.
One can either give a
scale_factor
or the target outputsize
to calculate the output size. (You cannot give both, as it is ambiguous)- Parameters
size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional) – output spatial sizes
scale_factor (float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float], optional) – multiplier for spatial size. Has to match input size if it is a tuple.
mode (str, optional) – the upsampling algorithm: one of
'nearest'
,'linear'
,'bilinear'
,'bicubic'
and'trilinear'
. Default:'nearest'
align_corners (bool, optional) – if
True
, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect whenmode
is'linear'
,'bilinear'
, or'trilinear'
. Default:False
- Shape:
Input: \((N, C, W_{in})\), \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, W_{out})\), \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\), where
\[D_{out} = \left\lfloor D_{in} \times \text{scale_factor} \right\rfloor\]\[H_{out} = \left\lfloor H_{in} \times \text{scale_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale_factor} \right\rfloor\]Warning
With
align_corners = True
, the linearly interpolating modes (linear, bilinear, bicubic, and trilinear) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior isalign_corners = False
. See below for concrete examples on how this affects the outputs.Note
If you want downsampling/general resizing, you should use
interpolate()
.For example:
>>> import numpy as np >>> import oneflow as flow >>> input = flow.tensor(np.arange(1, 5).reshape((1, 1, 2, 2)), dtype=flow.float32) >>> input = input.to("cuda") >>> m = flow.nn.Upsample(scale_factor=2.0, mode="nearest") >>> output = m(input) >>> output tensor([[[[1., 1., 2., 2.], ... [3., 3., 4., 4.]]]], device='cuda:0', dtype=oneflow.float32)
-
class
oneflow.nn.
UpsamplingBilinear2d
(size: Optional[Tuple[int, int]] = None, scale_factor: Optional[Tuple[float, float]] = None)¶ Applies a 2D bilinear upsampling to an input signal composed of several input channels.
To specify the scale, it takes either the
size
or thescale_factor
as it’s constructor argument.When
size
is given, it is the output size of the image (h, w).- Parameters
size (int or Tuple[int, int], optional) – output spatial sizes
scale_factor (float or Tuple[float, float], optional) – multiplier for spatial size.
Warning
This class is deprecated in favor of
interpolate()
. It is equivalent tonn.functional.interpolate(..., mode='bilinear', align_corners=True)
.- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor H_{in} \times \text{scale_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale_factor} \right\rfloor\]For example:
>>> import numpy as np >>> import oneflow as flow >>> input = flow.tensor(np.arange(1, 5).reshape((1, 1, 2, 2)), dtype=flow.float32) >>> input = input.to("cuda") >>> m = flow.nn.UpsamplingBilinear2d(scale_factor=2.0) >>> output = m(input) >>> output tensor([[[[1.0000, 1.3333, 1.6667, 2.0000], ... [3.0000, 3.3333, 3.6667, 4.0000]]]], device='cuda:0', dtype=oneflow.float32)
-
class
oneflow.nn.
UpsamplingNearest2d
(size: Optional[Tuple[int, int]] = None, scale_factor: Optional[Tuple[float, float]] = None)¶ Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels.
To specify the scale, it takes either the
size
or thescale_factor
as it’s constructor argument.When
size
is given, it is the output size of the image (h, w).- Parameters
size (int or Tuple[int, int], optional) – output spatial sizes
scale_factor (float or Tuple[float, float], optional) – multiplier for spatial size.
Warning
This class is deprecated in favor of
interpolate()
.- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor H_{in} \times \text{scale_factor} \right\rfloor\]\[W_{out} = \left\lfloor W_{in} \times \text{scale_factor} \right\rfloor\]For example:
>>> import numpy as np >>> import oneflow as flow >>> input = flow.tensor(np.arange(1, 5).reshape((1, 1, 2, 2)), dtype=flow.float32) >>> input = input.to("cuda") >>> m = flow.nn.UpsamplingNearest2d(scale_factor=2.0) >>> output = m(input) >>> output tensor([[[[1., 1., 2., 2.], ... [3., 3., 4., 4.]]]], device='cuda:0', dtype=oneflow.float32)
-
class
oneflow.nn.
ZeroPad2d
(padding: Union[int, tuple, list])¶ The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.ZeroPad2d.html.
Pads the input tensor boundaries with zero. User can set the amount of padding by setting the parameter paddings.
- Parameters
padding (Union[int, tuple]) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\mathrm{padding_{left}}\), \(\mathrm{padding_{right}}\), \(\mathrm{padding_{top}}\), \(\mathrm{padding_{bottom}}\))
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\(H_{out} = H_{in} + \mathrm{padding_{top}} + \mathrm{padding_{bottom}}\)
\(W_{out} = W_{in} + \mathrm{padding_{left}} + \mathrm{padding_{right}}\)
For example:
>>> import oneflow as flow >>> import numpy as np >>> m1 = flow.nn.ZeroPad2d(2) >>> m2 = flow.nn.ZeroPad2d((1,2,2,0)) >>> input = flow.tensor(np.arange(18).reshape((1, 2, 3, 3)).astype(np.float32)) >>> output = m1(input) >>> output.shape oneflow.Size([1, 2, 7, 7]) >>> output tensor([[[[ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 1., 2., 0., 0.], [ 0., 0., 3., 4., 5., 0., 0.], [ 0., 0., 6., 7., 8., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.]], [[ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 9., 10., 11., 0., 0.], [ 0., 0., 12., 13., 14., 0., 0.], [ 0., 0., 15., 16., 17., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.]]]], dtype=oneflow.float32) >>> output = m2(input) >>> output tensor([[[[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 2., 0., 0.], [ 0., 3., 4., 5., 0., 0.], [ 0., 6., 7., 8., 0., 0.]], [[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 9., 10., 11., 0., 0.], [ 0., 12., 13., 14., 0., 0.], [ 0., 15., 16., 17., 0., 0.]]]], dtype=oneflow.float32)
-
oneflow.nn.modules.pixelshuffle.
PixelShufflev2
(upscale_factor: Optional[int] = None, h_upscale_factor: Optional[int] = None, w_upscale_factor: Optional[int] = None) → None¶ Part of the documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.PixelShuffle.html.
Rearranges elements in a tensor of shape \((*, C \times r_h \times r_w, H, W)\) to a tensor of shape \((*, C, H \times r_h, W \times r_w)\), where r_h and r_w are upscale factors.
This is useful for implementing efficient sub-pixel convolution with a stride of \(1/r\).
See the paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Shi et. al (2016) for more details.
- Parameters
upscale_factor (int, optional) – factor to increase spatial resolution by, only use when factors of height and width spatial are the same.
h_upscale_factor (int, optional) – factor to increase height spatial resolution by, only one of h_upscale_factor and upscale_factor can be used.
w_upscale_factor (int, optional) – factor to increase width spatial resolution by, only one of w_upscale_factor and upscale_factor can be used.
- Shape:
Input: \((*, C_{in}, H_{in}, W_{in})\), where * is zero or more batch dimensions
Output: \((*, C_{out}, H_{out}, W_{out})\), where
if use upscale_factor:
\[ \begin{align}\begin{aligned}C_{out} = C_{in} \div \text{h_upscale_factor}^2\\H_{out} = H_{in} \times \text{upscale_factor}\\W_{out} = W_{in} \times \text{upscale_factor}\end{aligned}\end{align} \]if use h_upscale_factor and w_upscale_factor:
\[ \begin{align}\begin{aligned}C_{out} = C_{in} \div \text{h_upscale_factor} \div \text{w_upscale_factor}\\H_{out} = H_{in} \times \text{h_upscale_factor}\\W_{out} = W_{in} \times \text{w_upscale_factor}\end{aligned}\end{align} \]For example:
>>> import oneflow as flow >>> import numpy as np >>> m = flow.nn.PixelShuffle(upscale_factor=2) >>> x = flow.Tensor(np.random.randn(3, 4, 5, 5)) >>> y = m(x) >>> y.shape oneflow.Size([3, 1, 10, 10]) >>> m = flow.nn.PixelShuffle(h_upscale_factor=3, w_upscale_factor=4) >>> x = flow.Tensor(np.random.randn(1, 24, 2, 2)) >>> y = m(x) >>> y.shape oneflow.Size([1, 2, 6, 8])
-
oneflow.nn.parallel.
DistributedDataParallel
(module: oneflow.nn.module.Module, *, broadcast_buffers: bool = True, bucket_size: int = 10)¶
-
oneflow.nn.utils.
clip_grad_norm_
(parameters: Union[oneflow.Tensor, Iterable[oneflow.Tensor]], max_norm: float, norm_type: float = 2.0, error_if_nonfinite: bool = False) → oneflow.Tensor¶ Clips gradient norm of an iterable of parameters. The norm is computed over all gradients together, as if they were concatenated into a single vector.
- Parameters
parameters (Iterable[Tensor] or Tensor) – an iterable of Tensors or a single Tensor that will have gradients normalized
max_norm (float or int) – max norm of the gradients
norm_type (float or int) – type of the used p-norm. Can be
'inf'
for infinity norm.error_if_nonfinite (bool) – if True, an error is thrown if the total norm of the gradients from :attr:
parameters
isnan
,inf
, or-inf
. Default: False (will switch to True in the future)
- Returns
Parameters after cliping gradient norm Total norm of the parameters (viewed as a single vector).
For example:
>>> import oneflow as flow >>> import numpy as np >>> x1 = flow.tensor(np.array([[2, 3, 4], [1.5, 2.6, 3.7]]).astype(np.float32), requires_grad=True) >>> m1 = flow.nn.ReLU() >>> out1 = m1(x1) >>> out1 = out1.sum() >>> out1.backward() >>> norm1 = flow.nn.utils.clip_grad_norm_(x1, 0.6, 1.0) >>> norm1 tensor(6., dtype=oneflow.float32) >>> x1.grad tensor([[0.1000, 0.1000, 0.1000], [0.1000, 0.1000, 0.1000]], dtype=oneflow.float32) >>> x2 = flow.tensor(np.array([[-2, -3, -4], [2.5, 0, 3.2]]).astype(np.float32), requires_grad=True) >>> out2 = flow.atan(x2) >>> out2 = out2.sum() >>> out2.backward() >>> norm2 = flow.nn.utils.clip_grad_norm_(x2, 0.5) >>> norm2 tensor(1.0394, dtype=oneflow.float32) >>> x2.grad tensor([[0.0962, 0.0481, 0.0283], [0.0663, 0.4810, 0.0428]], dtype=oneflow.float32)
-
oneflow.nn.utils.
weight_norm
(module: T_module, name: str = 'weight', dim: int = 0) → T_module¶ Applies weight normalization to a parameter in the given module.
\[\mathbf{w}=g \frac{\mathbf{v}}{\|\mathbf{v}\|}\]Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This replaces the parameter specified by
name
(e.g.'weight'
) with two parameters: one specifying the magnitude (e.g.'weight_g'
) and one specifying the direction (e.g.'weight_v'
). Weight normalization is implemented via a hook that recomputes the weight tensor from the magnitude and direction before everyforward()
call.By default, with
dim=0
, the norm is computed independently per output channel/plane. To compute a norm over the entire weight tensor, usedim=None
.See https://arxiv.org/abs/1602.07868
This document description is refereced to the Pytorch document: https://pytorch.org/docs/1.10/generated/torch.nn.utils.weight_norm.html.
- Parameters
module (Module) – containing module
name (str, optional) – name of weight parameter
dim (int, optional) – dimension over which to compute the norm
- Returns
The original module with the weight norm hook
For example:
>>> import oneflow as flow >>> m = flow.nn.utils.weight_norm(flow.nn.Linear(20, 40), name='weight') >>> m Linear(in_features=20, out_features=40, bias=True) >>> m.weight_g.size() oneflow.Size([40, 1]) >>> m.weight_v.size() oneflow.Size([40, 20])
-
oneflow.nn.utils.
remove_weight_norm
(module: T_module, name: str = 'weight') → T_module¶ Removes the weight normalization reparameterization from a module.
- Parameters
module (Module) – containing module
name (str, optional) – name of weight parameter
For example:
>>> import oneflow as flow >>> m = flow.nn.utils.weight_norm(flow.nn.Linear(20, 40)) >>> flow.nn.utils.remove_weight_norm(m) Linear(in_features=20, out_features=40, bias=True)