Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Kaushal Santosh Bhogale; Tahir Javed; Greeshma Susan John; Dhruv Rathi; Akshayasree Padmanaban; Niharika Parasa; Mitesh M. Khapra

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Kaushal Santosh Bhogale, Tahir Javed, Greeshma Susan John, Dhruv Rathi, Akshayasree Padmanaban, Niharika Parasa, Mitesh M. Khapra

TL;DR

This work demonstrates that OIWER, by accounting for orthographic variations, reduces pessimistic error rates, narrows inflated model gaps, and aligns more closely with human perception than prior methods like WER-SN.

Abstract

Evaluating ASR systems for Indian languages is challenging due to spelling variations, suffix splitting flexibility, and non-standard spellings in code-mixed words. Traditional Word Error Rate (WER) often presents a bleaker picture of system performance than what human users perceive. Better aligning evaluation with real-world performance requires capturing permissible orthographic variations, which is extremely challenging for under-resourced Indian languages. Leveraging recent advances in LLMs, we propose a framework for creating benchmarks that capture permissible variations. Through extensive experiments, we demonstrate that OIWER, by accounting for orthographic variations, reduces pessimistic error rates (an average improvement of 6.3 points), narrows inflated model gaps (e.g., Gemini-Canary performance difference drops from 18.1 to 11.5 points), and aligns more closely with human perception than prior methods like WER-SN by 4.9 points.

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

TL;DR

Abstract

Paper Structure (14 sections, 3 figures, 3 tables)

This paper contains 14 sections, 3 figures, 3 tables.

Introduction
Related Work
Inflated WER in Indian Languages
A framework to create Orthographically Informed (OI) benchmarks
Identifying types of variations
LLM-assisted generation of word variations
Orthographically-Informed Word Error Rate
Experimental Setup
Results and Discussion
Evaluating ASR models using OIWER
Does OIWER better reflect human-perceived WER?
Do variations reduce substitution errors?
Can LLM-generated variations be used as a proxy for human-corrected variations?
Conclusion

Figures (3)

Figure 1: For Indian languages, WER reflects inflated values, much higher than perceived error (orange and blue). Moreover, there is a large discrepancy between two valid transcripts (black).
Figure 2: OIWER best aligns with perceived WER across languages.
Figure 3: (a) OIWER eliminates false substitutions. (b) OIWER with LLM-generated variations shows a strong correlation with human-corrected variations.

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

TL;DR

Abstract

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (3)