Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz, Antonio Rios-Vila, Jorge Calvo-Zaragoza
TL;DR
The survey addresses the problem of handwritten text recognition by organizing approaches into two granularity levels: up to line-level (words and lines) and beyond line-level (paragraphs and documents). It traces the methodological evolution from handcrafted, segmentation-based systems to end-to-end models based on CTC, sequence-to-sequence, and Transformer architectures, including beyond-line strategies like attention masking and line unfolding. It highlights benchmarking practices, datasets (real and synthetic), and evaluation metrics, while noting challenges around generalization, reproducibility, and data availability. The paper argues for standardized evaluation, broader multilingual benchmarks, and emphasizes promising directions such as Vision-Language Models and self-supervised learning to advance HTR, especially for historical and low-resource scripts.
Abstract
Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) \emph{up to line-level}, encompassing word and line recognition, and (2) \emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.
