Table of Contents
Fetching ...

Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts

Chandrashekar M S, Vineet Singh, Lakshmi Pedapudi

TL;DR

This work addresses the problem of robust ASR for agricultural advisory across high- and low-resource Indian languages by introducing a domain-aware benchmarking framework. It develops Agriculture Weighted Word Error Rate ($AWWER$) and an LLM-based utility score to emphasize critical agricultural terminology, and benchmarks 10 ASR systems on 10,934 field recordings in Hindi, Telugu, and Odia. Key findings reveal language-specific performance, substantial gains from speaker diarization in multi-speaker settings, and divergent rankings between conventional $WER$ and domain-aware metrics. The results offer practical deployment guidance, including model selection by language and the necessity of post-processing to protect agricultural term fidelity, while highlighting limitations and avenues for open benchmarking and extension. Overall, the paper provides a foundational public benchmark and analytic toolkit for advancing agricultural ASR in India's multilingual landscape, with implications for scalable, domain-sensitive voice advisory. $$AWWER$$, $$WER$$, and the utility scores together illuminate not just transcription accuracy but real-world advisory usefulness in low-resource domains.

Abstract

The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multiple Indian languages. This paper presents a benchmarking framework for evaluating ASR performance in agricultural contexts across Hindi, Telugu, and Odia languages. We introduce evaluation metrics including Agriculture Weighted Word Error Rate (AWWER) and domain-specific utility scoring to complement traditional metrics. Our evaluation of 10,934 audio recordings, each transcribed by up to 10 ASR models, reveals performance variations across languages and models, with Hindi achieving the best overall performance (WER: 16.2%) while Odia presents the greatest challenges (best WER: 35.1%, achieved only with speaker diarization). We characterize audio quality challenges inherent to real-world agricultural field recordings and demonstrate that speaker diarization with best-speaker selection can substantially reduce WER for multi-speaker recordings (upto 66% depending on the proportion of multi-speaker audio). We identify recurring error patterns in agricultural terminology and provide practical recommendations for improving ASR systems in low-resource agricultural domains. The study establishes baseline benchmarks for future agricultural ASR development.

Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts

TL;DR

This work addresses the problem of robust ASR for agricultural advisory across high- and low-resource Indian languages by introducing a domain-aware benchmarking framework. It develops Agriculture Weighted Word Error Rate () and an LLM-based utility score to emphasize critical agricultural terminology, and benchmarks 10 ASR systems on 10,934 field recordings in Hindi, Telugu, and Odia. Key findings reveal language-specific performance, substantial gains from speaker diarization in multi-speaker settings, and divergent rankings between conventional and domain-aware metrics. The results offer practical deployment guidance, including model selection by language and the necessity of post-processing to protect agricultural term fidelity, while highlighting limitations and avenues for open benchmarking and extension. Overall, the paper provides a foundational public benchmark and analytic toolkit for advancing agricultural ASR in India's multilingual landscape, with implications for scalable, domain-sensitive voice advisory. , , and the utility scores together illuminate not just transcription accuracy but real-world advisory usefulness in low-resource domains.

Abstract

The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multiple Indian languages. This paper presents a benchmarking framework for evaluating ASR performance in agricultural contexts across Hindi, Telugu, and Odia languages. We introduce evaluation metrics including Agriculture Weighted Word Error Rate (AWWER) and domain-specific utility scoring to complement traditional metrics. Our evaluation of 10,934 audio recordings, each transcribed by up to 10 ASR models, reveals performance variations across languages and models, with Hindi achieving the best overall performance (WER: 16.2%) while Odia presents the greatest challenges (best WER: 35.1%, achieved only with speaker diarization). We characterize audio quality challenges inherent to real-world agricultural field recordings and demonstrate that speaker diarization with best-speaker selection can substantially reduce WER for multi-speaker recordings (upto 66% depending on the proportion of multi-speaker audio). We identify recurring error patterns in agricultural terminology and provide practical recommendations for improving ASR systems in low-resource agricultural domains. The study establishes baseline benchmarks for future agricultural ASR development.
Paper Structure (36 sections, 4 equations, 6 figures, 8 tables)

This paper contains 36 sections, 4 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Distribution of audio issue types across languages. Background talk dominates all three languages, reflecting real-world agricultural consultation settings.
  • Figure 2: Full transcript WER vs best-speaker WER across models and languages. Models with higher multi-speaker percentages show larger improvements from best-speaker selection.
  • Figure 3: Hindi: Treemap of top agricultural term confusion pairs by domain category. Box sizes represent error frequency, with colors indicating domain categories.
  • Figure 4: Odia: Treemap of top agricultural term confusion pairs by domain category. Box sizes represent error frequency, with colors indicating domain categories.
  • Figure 5: Telugu: Treemap of top agricultural term confusion pairs by domain category. Box sizes represent error frequency, with colors indicating domain categories.
  • ...and 1 more figures