Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Ziyin Zhang; Ning Lu; Minghui Liao; Yongshuai Huang; Cheng Li; Min Wang; Wei Peng

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang, Wei Peng

TL;DR

The paper presents Distillation CTC (DCTC), a module-free self-distillation loss for CTC-based text recognition that adds frame-wise supervision through a MAP-derived latent alignment $z^*$. By combining the standard CTC loss with a distillation term, and deriving a closed-form estimate for $z^*$ from the CTC gradient $\mathbf{G}$ and probabilities $\mathbf{P}$, DCTC addresses alignment inconsistency without extra parameters or training phases. Empirical results across English and Chinese benchmarks show up to $2.6\%$ accuracy gains while preserving inference speed, and analyses demonstrate improved latent alignment quality and more cohesive feature representations. The method offers a lightweight, practical improvement with strong model- and loss-wise performance gains in TR tasks.

Abstract

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

TL;DR

The paper presents Distillation CTC (DCTC), a module-free self-distillation loss for CTC-based text recognition that adds frame-wise supervision through a MAP-derived latent alignment

. By combining the standard CTC loss with a distillation term, and deriving a closed-form estimate for

from the CTC gradient

and probabilities

, DCTC addresses alignment inconsistency without extra parameters or training phases. Empirical results across English and Chinese benchmarks show up to

accuracy gains while preserving inference speed, and analyses demonstrate improved latent alignment quality and more cohesive feature representations. The method offers a lightweight, practical improvement with strong model- and loss-wise performance gains in TR tasks.

Abstract

Paper Structure (17 sections, 10 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 10 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related works
Text recognition
CTC-related text recognition methods
Knowledge distillation on text recognition or CTC-based models
Methods
The Distillation Loss Term in CTC Scenario
Estimation of Latent Alignment $\mathbf{z}$
Summary of DCTC loss
Experiments
Datasets
Implementation Details
A Model-wise Comparison
A Loss-wise Comparison
Comparison of Latent Alignment Estimate
...and 2 more sections

Figures (4)

Figure 1: An illustraion of optimization and distillation on CTC- and attention-based models. Also shows the alignment inconsistency problem
Figure 2: The Architecture of DCTC in Self-distillation Scheme
Figure 3: Curves of AACC of Estimated Latent Alignment
Figure 4: Feature visualization. Each row represents a hard sample cluster

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

TL;DR

Abstract

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (4)