Table of Contents
Fetching ...

Handwritten Text Recognition: A Survey

Carlos Garrido-Munoz, Antonio Rios-Vila, Jorge Calvo-Zaragoza

TL;DR

The survey addresses the problem of handwritten text recognition by organizing approaches into two granularity levels: up to line-level (words and lines) and beyond line-level (paragraphs and documents). It traces the methodological evolution from handcrafted, segmentation-based systems to end-to-end models based on CTC, sequence-to-sequence, and Transformer architectures, including beyond-line strategies like attention masking and line unfolding. It highlights benchmarking practices, datasets (real and synthetic), and evaluation metrics, while noting challenges around generalization, reproducibility, and data availability. The paper argues for standardized evaluation, broader multilingual benchmarks, and emphasizes promising directions such as Vision-Language Models and self-supervised learning to advance HTR, especially for historical and low-resource scripts.

Abstract

Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) \emph{up to line-level}, encompassing word and line recognition, and (2) \emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.

Handwritten Text Recognition: A Survey

TL;DR

The survey addresses the problem of handwritten text recognition by organizing approaches into two granularity levels: up to line-level (words and lines) and beyond line-level (paragraphs and documents). It traces the methodological evolution from handcrafted, segmentation-based systems to end-to-end models based on CTC, sequence-to-sequence, and Transformer architectures, including beyond-line strategies like attention masking and line unfolding. It highlights benchmarking practices, datasets (real and synthetic), and evaluation metrics, while noting challenges around generalization, reproducibility, and data availability. The paper argues for standardized evaluation, broader multilingual benchmarks, and emphasizes promising directions such as Vision-Language Models and self-supervised learning to advance HTR, especially for historical and low-resource scripts.

Abstract

Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) \emph{up to line-level}, encompassing word and line recognition, and (2) \emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.

Paper Structure

This paper contains 42 sections, 14 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Overview of the different levels of granularity in Handwritten Text Recognition (HTR) systems, ranging from word-level to document-level transcription. The hierarchical structure illustrates the increasing complexity of HTR approaches as they seek to handle more extensive handwritten content, reflecting the advancements in this field.
  • Figure 2: Timeline of milestones in Handwritten Text Recognition (HTR). We categorize them into four levels: datasets and competitions (green), general methods/architectures (yellow), up-to-line models (red), and beyond-line models (blue).
  • Figure 3: Reading Order (RO) with a single direction at the line level.
  • Figure 4: Reading Order (RO) with two directions at the paragraph level.
  • Figure 5: Reading Order (RO) with three directions at the document level. In this case, the third RO is not necessarily consistent or trivial, and each manuscript may define its own rules regarding this aspect.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3