Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts
Anjali Sarawgi, Esteban Garces Arias, Christof Zotter
TL;DR
This work tackles Handwritten Text Recognition for Old Nepali, a low-resource historical script, by building an end-to-end, three-stage pipeline that bridges synthetic, printed, and handwritten data. It combines transformer-based encoders (TrOCR variants, Swin), script-aware decoding, and data-centric refinements (transcription normalization, extensive augmentation) to achieve a CER of 4.9% on line-level transcription. Key contributions include a three-stage transfer learning strategy, detailed analysis of tokenization and decoding strategies, and a publicly available codebase for reproducibility in HTR for historical Devanagari scripts. The results demonstrate that data quality and augmentation outperform architectural changes, and the approach enables practical digitization and analysis of Nepal's manuscript heritage.
Abstract
This paper presents the first end-to-end pipeline for Handwritten Text Recognition (HTR) for Old Nepali, a historically significant but low-resource language. We adopt a line-level transcription approach and systematically explore encoder-decoder architectures and data-centric techniques to improve recognition accuracy. Our best model achieves a Character Error Rate (CER) of 4.9\%. In addition, we implement and evaluate decoding strategies and analyze token-level confusions to better understand model behaviour and error patterns. While the dataset we used for evaluation is confidential, we release our training code, model configurations, and evaluation scripts to support further research in HTR for low-resource historical scripts.
