oneflow.nn.CTCLoss

class oneflow.nn.CTCLoss(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)

The Connectionist Temporal Classification loss. The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.CTCLoss.html.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be \(\leq\) the input length.

Parameters
  • blank (int, optional) – blank label. Default \(0\).

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: 'mean'

  • zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default: False Infinite losses mainly occur when the inputs are too short to be aligned to the targets.

Shape:
  • Log_probs: Tensor of size \((T, N, C)\), where \(T = \text{input length}\), \(N = \text{batch size}\), and \(C = \text{number of classes (including blank)}\).

  • Targets: Tensor of size \((N, S)\) or \((\operatorname{sum}(\text{target_lengths}))\), where \(N = \text{batch size}\) and \(S = \text{max target length, if shape is } (N, S)\). It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the \((N, S)\) form, targets are padded to the length of the longest sequence, and stacked. In the \((\operatorname{sum}(\text{target_lengths}))\) form, the targets are assumed to be un-padded and concatenated within 1 dimension.

  • Input_lengths: Tuple or tensor of size \((N)\), where \(N = \text{batch size}\). It represent the lengths of the inputs (must each be \(\leq T\)). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.

  • Target_lengths: Tuple or tensor of size \((N)\), where \(N = \text{batch size}\). It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is \((N,S)\), target_lengths are effectively the stop index \(s_n\) for each target sequence, such that target_n = targets[n,0:s_n] for each target in a batch. Lengths must each be \(\leq S\) If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.

Reference:

A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf

For example:

>>> import oneflow as flow

>>> log_probs = flow.tensor(
...    [
...        [[-1.1031, -0.7998, -1.5200], [-0.9808, -1.1363, -1.1908]],
...        [[-1.2258, -1.0665, -1.0153], [-1.1135, -1.2331, -0.9671]],
...        [[-1.3348, -0.6611, -1.5118], [-0.9823, -1.2355, -1.0941]],
...        [[-1.3850, -1.3273, -0.7247], [-0.8235, -1.4783, -1.0994]],
...        [[-0.9049, -0.8867, -1.6962], [-1.4938, -1.3630, -0.6547]],
...    ], dtype=flow.float32)
>>> targets = flow.tensor([[1, 2, 2], [1, 2, 2]], dtype=flow.int32)
>>> input_lengths = flow.tensor([5, 5], dtype=flow.int32)
>>> target_lengths = flow.tensor([3, 3], dtype=flow.int32)
>>> loss_mean = flow.nn.CTCLoss()
>>> out = loss_mean(log_probs, targets, input_lengths, target_lengths)
>>> out
tensor(1.1376, dtype=oneflow.float32)
>>> loss_sum = flow.nn.CTCLoss(blank=0, reduction="sum")
>>> out = loss_sum(log_probs, targets, input_lengths, target_lengths)
>>> out
tensor(6.8257, dtype=oneflow.float32)