Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems
Ajinkya Kulkarni, Atharva Kulkarni, Miguel Couceiro, Isabel Trancoso
TL;DR
This work investigates dual challenges in automatic speech recognition: biases across gender, age, and accents, and the environmental footprint of large ASR systems. By evaluating MMS and Whisper on bias-focused datasets Artie-Bias and CCv2, and measuring inference-time energy use and carbon emissions with three tracking tools across multiple GPUs, the study provides a comprehensive view of fairness and sustainability in real-world ASR. Findings show Whisper often outperforms MMS on read speech for bias metrics but can underperform on spontaneous speech, while MMS generally offers better sustainability; larger Whisper variants may underperform the medium size in some cases. The results highlight the importance of multi-metric benchmarking, the role of language adapters, and hardware characteristics in shaping both fairness and ecological impact, informing responsible deployment decisions in diverse linguistic contexts.
Abstract
In this paper, we present a bias and sustainability focused investigation of Automatic Speech Recognition (ASR) systems, namely Whisper and Massively Multilingual Speech (MMS), which have achieved state-of-the-art (SOTA) performances. Despite their improved performance in controlled settings, there remains a critical gap in understanding their efficacy and equity in real-world scenarios. We analyze ASR biases w.r.t. gender, accent, and age group, as well as their effect on downstream tasks. In addition, we examine the environmental impact of ASR systems, scrutinizing the use of large acoustic models on carbon emission and energy consumption. We also provide insights into our empirical analyses, offering a valuable contribution to the claims surrounding bias and sustainability in ASR systems.
