oneflow.nn.SmoothL1Loss¶
-
class
oneflow.nn.SmoothL1Loss(reduction: str = 'mean', beta: float = 1.0)¶ Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.SmoothL1Loss.html.
It is less sensitive to outliers than
torch.nn.MSELossand in some cases prevents exploding gradients (e.g. see the paper Fast R-CNN by Ross Girshick)..For a batch of size \(N\), the unreduced loss can be described as:
\[\ell(x, y) = L = \{l_1, ..., l_N\}^T\]with
\[\begin{split}l_n = \begin{cases} 0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\ |x_n - y_n| - 0.5 * beta, & \text{otherwise } \end{cases}\end{split}\]If reduction is not none, then:
\[\begin{split}\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}\end{split}\]Note
Smooth L1 loss can be seen as exactly
L1Loss, but with the \(|x - y| < beta\) portion replaced with a quadratic function such that its slope is 1 at \(|x - y| = beta\). The quadratic segment smooths the L1 loss near \(|x - y| = 0\).Note
Smooth L1 loss is closely related to
HuberLoss, being equivalent to \(huber(x, y) / beta\) (note that Smooth L1’s beta hyper-parameter is also known as delta for Huber). This leads to the following differences:As beta -> 0, Smooth L1 loss converges to
L1Loss, whileHuberLossconverges to a constant 0 loss.As beta -> \(+\infty\), Smooth L1 loss converges to a constant 0 loss, while
HuberLossconverges toMSELoss.For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1. For
HuberLoss, the slope of the L1 segment is beta.
- Parameters
size_average (bool, optional) – Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the fieldsize_averageis set toFalse, the losses are instead summed for each minibatch. Ignored whenreduceisFalse. Default:Truereduce (bool, optional) – Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending onsize_average. WhenreduceisFalse, returns a loss per batch element instead and ignoressize_average. Default:Truereduction (string, optional) – Specifies the reduction to apply to the output:
'none'|'mean'|'sum'.'none': no reduction will be applied,'mean': the sum of the output will be divided by the number of elements in the output,'sum': the output will be summed. Note:size_averageandreduceare in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction. Default:'mean'beta (float, optional) – Specifies the threshold at which to change between L1 and L2 loss. The value must be non-negative. Default: 1.0
- Shape:
Input: \((N, *)\) where \(*\) means any number of additional dimensions
Target: \((N, *)\); same shape as the input
Output: scalar. If
reductionis'none', then \((N, *)\); same shape as the input
For example:
>>> import oneflow as flow >>> import numpy as np >>> x = flow.tensor(np.array([0.1, 0.4, 0.3, 0.5, 0.9]).astype(np.float32), dtype=flow.float32) >>> y = flow.tensor(np.array([0.3, 0.9, 2.5, 0.4, 0.3]).astype(np.float32), dtype=flow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="none") >>> out = m(x, y) >>> out tensor([0.0200, 0.1250, 1.7000, 0.0050, 0.1800], dtype=oneflow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="mean") >>> out = m(x, y) >>> out tensor(0.4060, dtype=oneflow.float32) >>> m = flow.nn.SmoothL1Loss(reduction="sum") >>> out = m(x, y) >>> out tensor(2.0300, dtype=oneflow.float32)