Table of Contents
Fetching ...

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Wenze Ren, Yi-Cheng Lin, Wen-Chin Huang, Erica Cooper, Ryandhimas E. Zezario, Hsin-Min Wang, Hung-yi Lee, Yu Tsao

TL;DR

This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation and proposes a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy.

Abstract

The Mean Opinion Score (MOS) serves as the standard metric for speech quality assessment, yet biases in human annotations remain underexplored. We conduct the first systematic analysis of gender bias in MOS, revealing that male listeners consistently assign higher scores than female listeners--a gap that is most pronounced in low-quality speech and gradually diminishes as quality improves. This quality-dependent structure proves difficult to eliminate through simple calibration. We further demonstrate that automated MOS models trained on aggregated labels exhibit predictions skewed toward male standards of perception. To address this, we propose a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy. This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation.

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

TL;DR

This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation and proposes a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy.

Abstract

The Mean Opinion Score (MOS) serves as the standard metric for speech quality assessment, yet biases in human annotations remain underexplored. We conduct the first systematic analysis of gender bias in MOS, revealing that male listeners consistently assign higher scores than female listeners--a gap that is most pronounced in low-quality speech and gradually diminishes as quality improves. This quality-dependent structure proves difficult to eliminate through simple calibration. We further demonstrate that automated MOS models trained on aggregated labels exhibit predictions skewed toward male standards of perception. To address this, we propose a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy. This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation.
Paper Structure (17 sections, 3 equations, 2 figures, 4 tables)

This paper contains 17 sections, 3 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Across the four quality tiers of the BVCC training set, stratified by speaker gender, the mean rating difference between male and female listeners.
  • Figure 2: The proposed gender-aware MOS prediction architecture. A shared SSL encoder feeds into two networks: a Mean Net predicting overall MOS, and a Gender Net predicting gender-specific MOS scores with shared projection weights.