# oneflow.nn¶

These are the basic building blocks for graphs:

## Containers¶

 Module Base class for all neural network modules. Sequential A sequential container. ModuleList Holds submodules in a list. ModuleDict Holds submodules in a dictionary. ParameterList Holds parameters in a list. ParameterDict Holds parameters in a dictionary.

## nn.Module¶

 add_module Adds a child module to the current module. apply Applies fn recursively to every submodule (as returned by .children()) as well as self. buffers Returns an iterator over module buffers. children Returns an iterator over immediate children modules. cpu Moves all model parameters and buffers to the CPU. cuda Moves all model parameters and buffers to the GPU. double Casts all floating point parameters and buffers to double datatype. train Sets the module in training mode. eval Sets the module in evaluation mode. extra_repr Set the extra representation of the module float Casts all floating point parameters and buffers to float datatype. forward load_state_dict Copies parameters and buffers from state_dict into this module and its descendants. modules Returns an iterator over all modules in the network. named_buffers Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. named_children Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. named_modules Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. named_parameters Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. parameters Returns an iterator over module parameters. register_buffer Adds a buffer to the module. register_forward_hook Registers a forward hook on the module. register_forward_pre_hook Registers a forward pre-hook on the module. register_backward_hook Registers a backward hook on the module. register_full_backward_hook Registers a backward hook on the module. register_state_dict_pre_hook These hooks will be called with arguments: self, prefix, and keep_vars before calling state_dict on self. register_parameter Adds a parameter to the module. requires_grad_ Change if autograd should record operations on parameters in this module. state_dict Returns a dictionary containing a whole state of the module. to Moves and/or casts the parameters and buffers. zero_grad Sets gradients of all model parameters to zero.

Containers

## Convolution Layers¶

 nn.Conv1d Applies a 1D convolution over an input signal composed of several input planes. nn.Conv2d Applies a 2D convolution over an input signal composed of several input planes. nn.Conv3d Applies a 3D convolution over an input signal composed of several input planes. nn.ConvTranspose1d Applies a 1D transposed convolution operator over an input image composed of several input planes. nn.ConvTranspose2d Applies a 2D transposed convolution operator over an input image composed of several input planes. nn.ConvTranspose3d Applies a 3D transposed convolution operator over an input image composed of several input planes. nn.Unfold The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Unfold.html. nn.Fold The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.Fold.html.

## Pooling Layers¶

 nn.MaxPool1d Applies a 1D max pooling over an input signal composed of several input planes. nn.MaxPool2d Applies a 2D max pooling over an input signal composed of several input planes. nn.MaxPool3d Applies a 3D max pooling over an input signal composed of several input planes. nn.MaxUnpool1d Computes a partial inverse of MaxPool1d. nn.MaxUnpool2d Computes a partial inverse of MaxPool2d. nn.MaxUnpool3d Computes a partial inverse of MaxPool3d. nn.AdaptiveAvgPool1d Applies a 1D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool2d Applies a 2D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool3d Applies a 3D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool1d Applies a 1D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool2d Applies a 2D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool3d Applies a 3D adaptive max pooling over an input signal composed of several input planes. nn.AvgPool1d Applies a 1D average pooling over an input signal composed of several input planes. nn.AvgPool2d Performs the 2d-average pooling on the input. nn.AvgPool3d Applies a 3D average pooling over an input signal composed of several input planes.

 nn.ConstantPad1d Pads the input tensor boundaries with a constant value. nn.ConstantPad2d This operator pads the input with constant value that user specifies. nn.ConstantPad3d Pads the input tensor boundaries with a constant value. nn.ReflectionPad1d This operator pads the input tensor using the reflection of the input boundary. nn.ReflectionPad2d This operator pads the input tensor using the reflection of the input boundary. nn.ReplicationPad1d Pads the input tensor using replication of the input boundary. nn.ReplicationPad2d Pads the input tensor using the replication of the input boundary. nn.ZeroPad2d Pads the input tensor boundaries with zero.

## Non-linear Activations (weighted sum, nonlinearity)¶

 nn.ELU Applies the element-wise function nn.Hardshrink The Hardshrink activation. nn.Hardsigmoid Applies the element-wise function: nn.Hardswish Applies the hardswish function, element-wise, as described in the paper Searching for MobileNetV3. nn.Hardtanh Applies the HardTanh function element-wise nn.LeakyReLU Applies the element-wise function: nn.LogSigmoid Applies the element-wise function: nn.PReLU Applies the element-wise function: nn.ReLU Applies the rectified linear unit function element-wise: nn.ReLU6 Applies the element-wise function: nn.SELU Applies the element-wise function: nn.CELU Applies the element-wise function: nn.GELU The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.GELU.html. nn.QuickGELU Applies GELU approximation that is fast but somewhat inaccurate. nn.SiLU SiLU(Swish) activation: nn.Sigmoid Applies the element-wise function: nn.Mish Applies the element-wise function: nn.Softplus Applies the element-wise function: nn.Softshrink The Softshrink activation. nn.Softsign The SoftSign activation. nn.Tanh This operator computes the hyperbolic tangent value of Tensor. nn.Threshold The Threshold Activation. nn.GLU The GLU activation.

## Non-linear Activations (other)¶

 nn.Softmax Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. nn.LogSoftmax Applies the LogSoftmax function to an n-dimensional input Tensor.

## Normalization Layers¶

 nn.BatchNorm1d Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.BatchNorm2d Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.BatchNorm3d Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.SyncBatchNorm Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.FusedBatchNorm1d Applies Fused Batch Normalization over a 2D or 3D input, the formula is: nn.FusedBatchNorm2d Applies Fused Batch Normalization over a 4D input, the formula is: nn.FusedBatchNorm3d Applies Fused Batch Normalization over a 5D input, the formula is: nn.GroupNorm Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization nn.InstanceNorm1d Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.InstanceNorm2d Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.InstanceNorm3d Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.LayerNorm Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization nn.RMSLayerNorm Construct a layernorm module in the T5 style. nn.RMSNorm Applies Root Mean Square Layer Normalization over a mini-batch of inputs as described in the paper Root Mean Square Layer Normalization

## Recurrent Layers¶

 nn.RNN Applies a multi-layer Elman RNN with tanhtanh or text{ReLU}ReLU non-linearity to an input sequence. nn.LSTM Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. nn.GRU Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. nn.RNNCell An Elman RNN cell with tanh or ReLU non-linearity. nn.LSTMCell A long short-term memory (LSTM) cell. nn.GRUCell A gated recurrent unit (GRU) cell

## Linear Layers¶

 nn.Identity A placeholder identity operator that is argument-insensitive. nn.Linear Applies a linear transformation to the incoming data: $$y = xA^T + b$$

## Dropout Layers¶

 nn.Dropout During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. nn.Dropout1d Randomly zero out entire channels (a channel is a 1D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 1D tensor :math: ext{input}[i, j]). nn.Dropout2d Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 2D tensor :math: ext{input}[i, j]). nn.Dropout3d Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 3D tensor :math: ext{input}[i, j]).

## Sparse Layers¶

 nn.Embedding A simple lookup table that stores embeddings of a fixed dictionary and size.

## Distance Functions¶

 nn.CosineSimilarity Returns cosine similarity between $$x_1$$ and $$x_2$$, computed along dim. nn.PairwiseDistance Computes the pairwise distance between vectors $$v_1$$, $$v_2$$ using the p-norm:

## Loss Functions¶

 nn.BCELoss This operator computes the binary cross entropy loss. nn.BCEWithLogitsLoss This operator combines the Sigmoid and BCELoss together. nn.CTCLoss The Connectionist Temporal Classification loss. nn.CombinedMarginLoss The operation implements “margin_softmax” in InsightFace: https://github.com/deepinsight/insightface/blob/master/recognition/arcface_mxnet/train.py The implementation of margin_softmax in InsightFace is composed of multiple operators. nn.CrossEntropyLoss The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.CrossEntropyLoss.html. nn.KLDivLoss The Kullback-Leibler divergence loss measure nn.L1Loss This operator computes the L1 Loss between each element in input and target. nn.MSELoss Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input $$x$$ and target $$y$$. nn.MarginRankingLoss Creates a criterion that measures the loss given inputs $$x1$$, $$x2$$, two 1D mini-batch Tensors, and a label 1D mini-batch tensor $$y$$ (containing 1 or -1). nn.NLLLoss The negative log likelihood loss. nn.SmoothL1Loss Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. nn.TripletMarginLoss Creates a criterion that measures the triplet loss given an input tensors $$x1$$, $$x2$$, $$x3$$ and a margin with a value greater than $$0$$.

## Vision Layers¶

 nn.PixelShuffle alias of oneflow.nn.modules.pixelshuffle.PixelShufflev2 nn.Upsample Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. nn.UpsamplingBilinear2d Applies a 2D bilinear upsampling to an input signal composed of several input channels. nn.UpsamplingNearest2d Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels.

## DataParallel Layers (multi-GPU, distributed)¶

 nn.COCOReader nn.CoinFlip Generates random boolean values following a bernoulli distribution. nn.CropMirrorNormalize Performs fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting. nn.OFRecordBytesDecoder This operator reads an tensor as bytes. nn.OFRecordImageDecoder nn.OFRecordImageDecoderRandomCrop nn.OFRecordRawDecoder nn.OFRecordReader

## Quantization Aware Training¶

 nn.MinMaxObserver Compute the quantization parameters of the input tensor. nn.MovingAverageMinMaxObserver Compute the quantization parameters based on the moving average of the input tensor’s min and max values. nn.FakeQuantization Simulate the quantize and dequantize operations in training time. nn.QatConv1d A Conv1d module attached with nn.MinMaxObserver, nn.MovingAverageMinMaxObserver and nn.FakeQuantization modules for weight and input, used for quantization aware training. nn.QatConv2d A Conv2d module attached with nn.MinMaxObserver, nn.MovingAverageMinMaxObserver and nn.FakeQuantization modules for weight and input, used for quantization aware training. nn.QatConv3d A Conv3d module attached with nn.MinMaxObserver, nn.MovingAverageMinMaxObserver and nn.FakeQuantization modules for weight and input, used for quantization aware training.

## Utilities¶

From the oneflow.nn.utils module

 clip_grad_norm_ Clips gradient norm of an iterable of parameters. clip_grad_value_ Clips gradient of an iterable of parameters at specified value. weight_norm Applies weight normalization to a parameter in the given module. remove_weight_norm Removes the weight normalization reparameterization from a module.

Utility functions in other modules

 nn.utils.rnn.PackedSequence The interface is consistent with PyTorch. nn.utils.rnn.pack_padded_sequence The interface is consistent with PyTorch. nn.utils.rnn.pad_packed_sequence The interface is consistent with PyTorch. nn.utils.rnn.pad_sequence The interface is consistent with PyTorch. nn.utils.rnn.pack_sequence Packs a list of variable length Tensors
 nn.Flatten Flattens a contiguous range of dims into a tensor.

## Quantized Functions¶

Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision.

 nn.FakeQuantization Simulate the quantize and dequantize operations in training time. nn.MinMaxObserver Compute the quantization parameters of the input tensor. nn.MovingAverageMinMaxObserver Compute the quantization parameters based on the moving average of the input tensor’s min and max values. nn.Quantization Simulate the quantize operation in inference time.