# oneflow.nn.CTCLoss¶

class oneflow.nn.CTCLoss(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)

The Connectionist Temporal Classification loss. The interface is consistent with PyTorch. The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.nn.CTCLoss.html.

Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be $$\leq$$ the input length.

Parameters
• blank (int, optional) – blank label. Default $$0$$.

• reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: 'mean'

• zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default: False Infinite losses mainly occur when the inputs are too short to be aligned to the targets.

Shape:
• Log_probs: Tensor of size $$(T, N, C)$$, where $$T = \text{input length}$$, $$N = \text{batch size}$$, and $$C = \text{number of classes (including blank)}$$.

• Targets: Tensor of size $$(N, S)$$ or $$(\operatorname{sum}(\text{target_lengths}))$$, where $$N = \text{batch size}$$ and $$S = \text{max target length, if shape is } (N, S)$$. It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the $$(N, S)$$ form, targets are padded to the length of the longest sequence, and stacked. In the $$(\operatorname{sum}(\text{target_lengths}))$$ form, the targets are assumed to be un-padded and concatenated within 1 dimension.

• Input_lengths: Tuple or tensor of size $$(N)$$, where $$N = \text{batch size}$$. It represent the lengths of the inputs (must each be $$\leq T$$). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.

• Target_lengths: Tuple or tensor of size $$(N)$$, where $$N = \text{batch size}$$. It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is $$(N,S)$$, target_lengths are effectively the stop index $$s_n$$ for each target sequence, such that target_n = targets[n,0:s_n] for each target in a batch. Lengths must each be $$\leq S$$ If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.

Reference:

A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf

For example:

>>> import oneflow as flow

>>> log_probs = flow.tensor(
...    [
...        [[-1.1031, -0.7998, -1.5200], [-0.9808, -1.1363, -1.1908]],
...        [[-1.2258, -1.0665, -1.0153], [-1.1135, -1.2331, -0.9671]],
...        [[-1.3348, -0.6611, -1.5118], [-0.9823, -1.2355, -1.0941]],
...        [[-1.3850, -1.3273, -0.7247], [-0.8235, -1.4783, -1.0994]],
...        [[-0.9049, -0.8867, -1.6962], [-1.4938, -1.3630, -0.6547]],
...    ], dtype=flow.float32)
>>> targets = flow.tensor([[1, 2, 2], [1, 2, 2]], dtype=flow.int32)
>>> input_lengths = flow.tensor([5, 5], dtype=flow.int32)
>>> target_lengths = flow.tensor([3, 3], dtype=flow.int32)
>>> loss_mean = flow.nn.CTCLoss()
>>> out = loss_mean(log_probs, targets, input_lengths, target_lengths)
>>> out
tensor(1.1376, dtype=oneflow.float32)
>>> loss_sum = flow.nn.CTCLoss(blank=0, reduction="sum")
>>> out = loss_sum(log_probs, targets, input_lengths, target_lengths)
>>> out
tensor(6.8257, dtype=oneflow.float32)