FairLENS: Assessing Fairness in Law Enforcement Speech Recognition
Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Bryan Wheeler, Yasser Ibrahim
TL;DR
This paper addresses fairness gaps in automatic speech recognition for law-enforcement use, where accuracy varies across demographic groups and acoustic conditions. It introduces FairLENS, an adaptable fairness evaluation framework with a $WER$-based disparity metric and a Wilcoxon signed-rank test, plus a new FairLENS dataset featuring self-identified demographics and diverse real-world scenarios. Applied to 1 open-source and 11 commercial ASR models, the analysis reveals heterogeneous fairness profiles and biases toward groups such as Asian, African American, Teens, and Southern accents, with performance degradation amplified by acoustic domain shifts. The work provides a principled tool for model selection in safety-critical contexts and highlights the need for more diverse and balanced training data to reduce demographic biases.
Abstract
Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most public ASR datasets are insufficient to perform a satisfying fairness evaluation. To address the limitations, we built FairLENS - a systematic fairness evaluation framework. We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We also collected a fairness evaluation dataset covering multiple scenarios and demographic dimensions. Leveraging this framework, we conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models. Our results reveal that certain models exhibit more biases than others, serving as a fairness guideline for users to make informed choices when selecting ASR models for a given real-world scenario. We further explored model biases towards specific demographic groups and observed that shifts in the acoustic domain can lead to the emergence of new biases.
