Table of Contents
Fetching ...

Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition

Solène Tarride, Christopher Kermorvant

TL;DR

The study investigates whether explicit n-gram language models continue to add value to modern handwritten text recognition (ATR) when used with PyLaia and DAN across IAM, RIMES, and NorHand v2. It builds and evaluates n-gram LMs at character, subword, and word levels, tuning order up to 6, LM weight (best around 1.5 for char/subword and 0.5 for word), and Kneser-Ney smoothing, using KenLM and SRILM in beam-search decoding. Results show consistent performance gains from explicit LMs, with character-level LMs delivering the strongest improvements and DAN+char LM achieving new benchmarks on several datasets, especially at page-level; subword LMs provide mixed benefits, and word-level LMs often harm performance. The findings support hybrid explicit-implicit modeling in ATR, highlight practical speed tradeoffs due to CPU-based LM rescoring, and suggest future work to blend multiple tokenization granularities for robust handling of diverse scripts and historical texts.

Abstract

In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM, RIMES, and NorHand v2 - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.

Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition

TL;DR

The study investigates whether explicit n-gram language models continue to add value to modern handwritten text recognition (ATR) when used with PyLaia and DAN across IAM, RIMES, and NorHand v2. It builds and evaluates n-gram LMs at character, subword, and word levels, tuning order up to 6, LM weight (best around 1.5 for char/subword and 0.5 for word), and Kneser-Ney smoothing, using KenLM and SRILM in beam-search decoding. Results show consistent performance gains from explicit LMs, with character-level LMs delivering the strongest improvements and DAN+char LM achieving new benchmarks on several datasets, especially at page-level; subword LMs provide mixed benefits, and word-level LMs often harm performance. The findings support hybrid explicit-implicit modeling in ATR, highlight practical speed tradeoffs due to CPU-based LM rescoring, and suggest future work to blend multiple tokenization granularities for robust handling of diverse scripts and historical texts.

Abstract

In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM, RIMES, and NorHand v2 - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.
Paper Structure (32 sections, 1 figure, 7 tables)

This paper contains 32 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Examples of pages from the three datasets used in this work. Note that we use full pages for RIMES and paragraphs for IAM (to exclude the printed header) and NorHand v2 (to simplify reading order). Paragraphs are highlighted.