Towards Writing Style Adaptation in Handwriting Recognition
Jan Kohút, Michal Hradiš, Martin Kišš
TL;DR
This paper tackles handwriting recognition across many writer styles by introducing WS-Net, a CNN+RNN architecture augmented with a Writer Style Block (WSB) that conditions normalization on writer-style embeddings derived from a Writer Style Identifier. The approach enables explicit style-aware processing via adaptive instance normalization, and the authors explore both randomly initialized and contrastively pre-trained writer embeddings. Key findings show that pre-trained writer-style embeddings consistently improve transcription accuracy in writer-dependent settings, whereas learning embeddings from scratch is less reliable; in writer-independent settings, simple baselines like finetuning can outperform WS-Net, highlighting the need for a more robust mechanism (e.g., attention-based aggregation) to handle unseen writers. The work demonstrates the feasibility of explicit style adaptation in handwriting recognition and provides a foundation for future style-aware or attention-guided approaches to bridge writer-independent performance gaps with minimal overhead.
Abstract
One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models can be trained on datasets with partitions likely written by a single author (e.g. single letter, diary, or chronicle). We propose a Writer Style Block (WSB), an adaptive instance normalization layer conditioned on learned embeddings of the partitions. We experimented with various placements and settings of WSB and contrastively pre-trained embeddings. We show that our approach outperforms a baseline with no WSB in a writer-dependent scenario and that it is possible to estimate embeddings for new writers. However, domain adaptation using simple fine-tuning in a writer-independent setting provides superior accuracy at a similar computational cost. The proposed approach should be further investigated in terms of training stability and embedding regularization to overcome such a baseline.
