Fine-tuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition
Jan Kohút, Michal Hradiš
TL;DR
The paper tackles domain shift in handwriting recognition by proposing a simple yet effective baseline: fine-tuning a pre-trained CTC-based CNN-RNN model on a small number of annotated target lines supplemented with targeted data augmentations. It demonstrates substantial relative improvements in both writer-independent (20–45% reductions in $CER$) and writer-dependent settings, even with as few as 1–2 adaptation lines, and analyzes augmentation choices, stopping criteria, and pre-trained model quality. The authors introduce a large Czech handwriting dataset (CzechHWR) to support domain adaptation research and provide practical guidance for live deployment, including fixed iteration ratios that outperform cross-validation in some cases. Overall, the work establishes fine-tuning with carefully chosen augmentations as a robust, scalable baseline for domain adaptation in handwritten text recognition and offers actionable recommendations for real-world systems like PERO OCR.
Abstract
In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple fine-tuning with data augmentation works surprisingly well in such scenarios and that it is resistant to overfitting even for very small target domain datasets. We evaluated the behavior of fine-tuning with respect to augmentation, training data size, and quality of the pre-trained network, both in writer-dependent and writer-independent settings. On a large real-world dataset, fine-tuning on new writers provided an average relative CER improvement of 25 % for 16 text lines and 50 % for 256 text lines.
