oneflow.nn.CrossEntropyLoss

class oneflow.nn.CrossEntropyLoss(weight: Optional[oneflow.Tensor] = None, ignore_index: int = - 100, reduction: str = 'mean', label_smoothing: float = 0.0)

The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.CrossEntropyLoss.html.

This criterion combines LogSoftmax and NLLLoss in one single class.

It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The input is expected to contain raw, unnormalized scores for each class. input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case (described later).

The target that this criterion expects should contain either:

  • Class indices in the range \([0, C)\) where \(C\) is the number of classes; if ignore_index is specified, this loss also accepts this class index (this index may not necessarily be in the class range). The unreduced (i.e. with reduction set to 'none') loss for this case can be described as:

    \[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore_index}\}\]

    where \(x\) is the input, \(y\) is the target, \(w\) is the weight, \(C\) is the number of classes, and \(N\) spans the minibatch dimension as well as \(d_1, ..., d_k\) for the K-dimensional case. If reduction is not 'none' (default 'mean'), then

    \[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore_index}\}} l_n, & \text{if reduction} = \text{'mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]

    Note that this case is equivalent to the combination of LogSoftmax and NLLLoss.

  • Probabilities for each class; useful when labels beyond a single class per minibatch item are required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with reduction set to 'none') loss for this case can be described as:

    \[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}\]

    where \(x\) is the input, \(y\) is the target, \(w\) is the weight, \(C\) is the number of classes, and \(N\) spans the minibatch dimension as well as \(d_1, ..., d_k\) for the K-dimensional case. If reduction is not 'none' (default 'mean'), then

    \[\begin{split}\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{'mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]
Parameters
  • weight (oneflow.Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C

  • ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When reduction is mean, the loss is averaged over non-ignored targets. Note that ignore_index is only applicable when the target contains class indices.

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the weighted mean of the output is taken, 'sum': the output will be summed. Default: 'mean'

  • label_smoothing (float, optinoal) – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: \(0.0\).

Shape:
  • Input: Shape :\((N, C)\) or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.

  • Target: If containing class indices, shape \((N)\) or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss where each value should be between \([0, C)\). If containing class probabilities, same shape as the input and each value should be between \([0, 1]\).

  • Output: If reduction is ‘none’, same shape as the target. Otherwise, scalar.

where:

\[\begin{split}\begin{aligned} C ={} & \text{number of classes} \\ N ={} & \text{batch size} \\ \end{aligned}\end{split}\]

For example:

>>> import oneflow as flow
>>> import numpy as np

>>> input = flow.tensor(
...    [[-0.1664078, -1.7256707, -0.14690138],
...        [-0.21474946, 0.53737473, 0.99684894],
...        [-1.135804, -0.50371903, 0.7645404]], dtype=flow.float32)
>>> target = flow.tensor(np.array([0, 1, 2]), dtype=flow.int32)
>>> out = flow.nn.CrossEntropyLoss(reduction="none")(input, target)
>>> out
tensor([0.8020, 1.1167, 0.3583], dtype=oneflow.float32)
>>> out_sum = flow.nn.CrossEntropyLoss(reduction="sum")(input, target)
>>> out_sum
tensor(2.2769, dtype=oneflow.float32)
>>> out_mean = flow.nn.CrossEntropyLoss(reduction="mean")(input, target)
>>> out_mean
tensor(0.7590, dtype=oneflow.float32)
>>> out_ignore_0 = flow.nn.CrossEntropyLoss(reduction="none", ignore_index=0)(input, target)
>>> out_ignore_0
tensor([0.0000, 1.1167, 0.3583], dtype=oneflow.float32)
>>> out_label_smoothing = flow.nn.CrossEntropyLoss(reduction="none", label_smoothing=0.5)(input, target)
>>> out_label_smoothing
tensor([1.0586, 1.1654, 0.8864], dtype=oneflow.float32)
>>> probs = flow.tensor([[ 0.99495536,  0.28255007, -0.2775054 ],
...    [ 0.42397153,  0.01075112,  0.56527734],
...    [ 0.72356546, -0.1304398 ,  0.4068744 ]], dtype=flow.float32)
>>> out = flow.nn.CrossEntropyLoss()(input, probs)
>>> out
tensor(1.3305, dtype=oneflow.float32)