Table of Contents
Fetching ...

Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts

Anjali Sarawgi, Esteban Garces Arias, Christof Zotter

TL;DR

This work tackles Handwritten Text Recognition for Old Nepali, a low-resource historical script, by building an end-to-end, three-stage pipeline that bridges synthetic, printed, and handwritten data. It combines transformer-based encoders (TrOCR variants, Swin), script-aware decoding, and data-centric refinements (transcription normalization, extensive augmentation) to achieve a CER of 4.9% on line-level transcription. Key contributions include a three-stage transfer learning strategy, detailed analysis of tokenization and decoding strategies, and a publicly available codebase for reproducibility in HTR for historical Devanagari scripts. The results demonstrate that data quality and augmentation outperform architectural changes, and the approach enables practical digitization and analysis of Nepal's manuscript heritage.

Abstract

This paper presents the first end-to-end pipeline for Handwritten Text Recognition (HTR) for Old Nepali, a historically significant but low-resource language. We adopt a line-level transcription approach and systematically explore encoder-decoder architectures and data-centric techniques to improve recognition accuracy. Our best model achieves a Character Error Rate (CER) of 4.9\%. In addition, we implement and evaluate decoding strategies and analyze token-level confusions to better understand model behaviour and error patterns. While the dataset we used for evaluation is confidential, we release our training code, model configurations, and evaluation scripts to support further research in HTR for low-resource historical scripts.

Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts

TL;DR

This work tackles Handwritten Text Recognition for Old Nepali, a low-resource historical script, by building an end-to-end, three-stage pipeline that bridges synthetic, printed, and handwritten data. It combines transformer-based encoders (TrOCR variants, Swin), script-aware decoding, and data-centric refinements (transcription normalization, extensive augmentation) to achieve a CER of 4.9% on line-level transcription. Key contributions include a three-stage transfer learning strategy, detailed analysis of tokenization and decoding strategies, and a publicly available codebase for reproducibility in HTR for historical Devanagari scripts. The results demonstrate that data quality and augmentation outperform architectural changes, and the approach enables practical digitization and analysis of Nepal's manuscript heritage.

Abstract

This paper presents the first end-to-end pipeline for Handwritten Text Recognition (HTR) for Old Nepali, a historically significant but low-resource language. We adopt a line-level transcription approach and systematically explore encoder-decoder architectures and data-centric techniques to improve recognition accuracy. Our best model achieves a Character Error Rate (CER) of 4.9\%. In addition, we implement and evaluate decoding strategies and analyze token-level confusions to better understand model behaviour and error patterns. While the dataset we used for evaluation is confidential, we release our training code, model configurations, and evaluation scripts to support further research in HTR for low-resource historical scripts.

Paper Structure

This paper contains 51 sections, 3 equations, 22 figures, 20 tables.

Figures (22)

  • Figure 1: Sample manuscript containing Old Nepali in Devanagari script. This image is sourced and cropped from the Documenta Nepalica saha1832documenta, courtesy of Manik Bajracharya.
  • Figure 2: Sample of line image after pre-processing.
  • Figure 3: Sample of processed data for the first stage.
  • Figure 4: Sample of processed data for the second stage.
  • Figure 5: Printed Nagari script sample (top) and model's predicted transcription (bottom).
  • ...and 17 more figures