Table of Contents
Fetching ...

Fine-tuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition

Jan Kohút, Michal Hradiš

TL;DR

The paper tackles domain shift in handwriting recognition by proposing a simple yet effective baseline: fine-tuning a pre-trained CTC-based CNN-RNN model on a small number of annotated target lines supplemented with targeted data augmentations. It demonstrates substantial relative improvements in both writer-independent (20–45% reductions in $CER$) and writer-dependent settings, even with as few as 1–2 adaptation lines, and analyzes augmentation choices, stopping criteria, and pre-trained model quality. The authors introduce a large Czech handwriting dataset (CzechHWR) to support domain adaptation research and provide practical guidance for live deployment, including fixed iteration ratios that outperform cross-validation in some cases. Overall, the work establishes fine-tuning with carefully chosen augmentations as a robust, scalable baseline for domain adaptation in handwritten text recognition and offers actionable recommendations for real-world systems like PERO OCR.

Abstract

In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple fine-tuning with data augmentation works surprisingly well in such scenarios and that it is resistant to overfitting even for very small target domain datasets. We evaluated the behavior of fine-tuning with respect to augmentation, training data size, and quality of the pre-trained network, both in writer-dependent and writer-independent settings. On a large real-world dataset, fine-tuning on new writers provided an average relative CER improvement of 25 % for 16 text lines and 50 % for 256 text lines.

Fine-tuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition

TL;DR

The paper tackles domain shift in handwriting recognition by proposing a simple yet effective baseline: fine-tuning a pre-trained CTC-based CNN-RNN model on a small number of annotated target lines supplemented with targeted data augmentations. It demonstrates substantial relative improvements in both writer-independent (20–45% reductions in ) and writer-dependent settings, even with as few as 1–2 adaptation lines, and analyzes augmentation choices, stopping criteria, and pre-trained model quality. The authors introduce a large Czech handwriting dataset (CzechHWR) to support domain adaptation research and provide practical guidance for live deployment, including fixed iteration ratios that outperform cross-validation in some cases. Overall, the work establishes fine-tuning with carefully chosen augmentations as a robust, scalable baseline for domain adaptation in handwritten text recognition and offers actionable recommendations for real-world systems like PERO OCR.

Abstract

In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple fine-tuning with data augmentation works surprisingly well in such scenarios and that it is resistant to overfitting even for very small target domain datasets. We evaluated the behavior of fine-tuning with respect to augmentation, training data size, and quality of the pre-trained network, both in writer-dependent and writer-independent settings. On a large real-world dataset, fine-tuning on new writers provided an average relative CER improvement of 25 % for 16 text lines and 50 % for 256 text lines.
Paper Structure (12 sections, 1 equation, 10 figures, 3 tables)

This paper contains 12 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Black, samples from the large general source CzechHWR dataset. ID with Color, representative words of 19 target writers.
  • Figure 2: Augmented versions of top left text line image with NoiseBlurGamma augmentation. Intensity 1 is shown in the top section and intensity 2 in the bottom one. Only the extreme samples of the distributions are shown.
  • Figure 3: Augmented versions of top left text line image with Color augmentation. Intensity 1 is shown in the top section and intensity 2 in the bottom one. Only the extreme samples of the distributions are shown.
  • Figure 4: Augmented versions of top left text line image with Geometry augmentation. Intensity 1 is shown in the top section, intensity 2 in the middle one, and intensity 3 in the bottom one. Only the extreme samples of the distributions are shown.
  • Figure 5: The performance of models fine-tuned with different augmentations expressed as a relative reduction of the baseline model test CER. The means and the standard deviations represent the target writer distribution.
  • ...and 5 more figures