Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

Solène Tarride; Yoann Schneider; Marie Generali-Lince; Mélodie Boillet; Bastien Abadie; Christopher Kermorvant

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

Solène Tarride, Yoann Schneider, Marie Generali-Lince, Mélodie Boillet, Bastien Abadie, Christopher Kermorvant

TL;DR

This work enhances the PyLaia ATR library by adding calibrated confidence scores and explicit n-gram language modeling to the decoding process, enabling language-model-aware predictions without expert setup. The authors implement multiple confidence-estimation methods, show temperature scaling improves calibration, and integrate SRILM/KenLM-based language models at character, subword, and word levels with beam-search decoding. Across twelve diverse datasets, LM decoding yields a notable average CER decrease of ~11.9% and WER decrease of ~12.9%, though gains vary by language/script and data difficulty. They release extensive resources, including 12 pretrained models on Hugging Face and documentation, and discuss speed trade-offs, advocating batch processing to balance accuracy and throughput in practical ATR deployments.

Abstract

PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy. In this paper, we outline our recent contributions to the PyLaia library, focusing on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels. One of the highlights of this work is that language models are completely auto-tuned: they can be built and used easily without any expert knowledge, and without requiring any additional data. To demonstrate the significance of our contribution, we evaluate PyLaia's performance on twelve datasets, both with and without language modelling. The results show that decoding with small language models improves the Word Error Rate by 13% and the Character Error Rate by 12% in average. Additionally, we conduct an analysis of confidence scores and highlight the importance of calibration techniques. Our implementation is publicly available in the official PyLaia repository at https://gitlab.teklia.com/atr/pylaia, and twelve open-source models are released on Hugging Face.

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

TL;DR

Abstract

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

Authors

TL;DR

Abstract

Table of Contents