Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts
Chandrashekar M S, Vineet Singh, Lakshmi Pedapudi
TL;DR
This work addresses the problem of robust ASR for agricultural advisory across high- and low-resource Indian languages by introducing a domain-aware benchmarking framework. It develops Agriculture Weighted Word Error Rate ($AWWER$) and an LLM-based utility score to emphasize critical agricultural terminology, and benchmarks 10 ASR systems on 10,934 field recordings in Hindi, Telugu, and Odia. Key findings reveal language-specific performance, substantial gains from speaker diarization in multi-speaker settings, and divergent rankings between conventional $WER$ and domain-aware metrics. The results offer practical deployment guidance, including model selection by language and the necessity of post-processing to protect agricultural term fidelity, while highlighting limitations and avenues for open benchmarking and extension. Overall, the paper provides a foundational public benchmark and analytic toolkit for advancing agricultural ASR in India's multilingual landscape, with implications for scalable, domain-sensitive voice advisory. $$AWWER$$, $$WER$$, and the utility scores together illuminate not just transcription accuracy but real-world advisory usefulness in low-resource domains.
Abstract
The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multiple Indian languages. This paper presents a benchmarking framework for evaluating ASR performance in agricultural contexts across Hindi, Telugu, and Odia languages. We introduce evaluation metrics including Agriculture Weighted Word Error Rate (AWWER) and domain-specific utility scoring to complement traditional metrics. Our evaluation of 10,934 audio recordings, each transcribed by up to 10 ASR models, reveals performance variations across languages and models, with Hindi achieving the best overall performance (WER: 16.2%) while Odia presents the greatest challenges (best WER: 35.1%, achieved only with speaker diarization). We characterize audio quality challenges inherent to real-world agricultural field recordings and demonstrate that speaker diarization with best-speaker selection can substantially reduce WER for multi-speaker recordings (upto 66% depending on the proportion of multi-speaker audio). We identify recurring error patterns in agricultural terminology and provide practical recommendations for improving ASR systems in low-resource agricultural domains. The study establishes baseline benchmarks for future agricultural ASR development.
