MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Wenze Ren; Yi-Cheng Lin; Wen-Chin Huang; Erica Cooper; Ryandhimas E. Zezario; Hsin-Min Wang; Hung-yi Lee; Yu Tsao

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Wenze Ren, Yi-Cheng Lin, Wen-Chin Huang, Erica Cooper, Ryandhimas E. Zezario, Hsin-Min Wang, Hung-yi Lee, Yu Tsao

TL;DR

This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation and proposes a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy.

Abstract

The Mean Opinion Score (MOS) serves as the standard metric for speech quality assessment, yet biases in human annotations remain underexplored. We conduct the first systematic analysis of gender bias in MOS, revealing that male listeners consistently assign higher scores than female listeners--a gap that is most pronounced in low-quality speech and gradually diminishes as quality improves. This quality-dependent structure proves difficult to eliminate through simple calibration. We further demonstrate that automated MOS models trained on aggregated labels exhibit predictions skewed toward male standards of perception. To address this, we propose a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy. This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation.

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 2 figures, 4 tables)

This paper contains 17 sections, 3 equations, 2 figures, 4 tables.

Related Work
Bias in Human Annotation
Fairness in Speech Processing
Bias Analysis in MOS Annotations
Dataset and Toolkit
Problem Formulation
Overall Gender Rating Difference
Quality-Dependent Gender Difference
Limitations of Gender-Agnostic MOS
Experimental Setup
Bias Inheritance Analysis
Gender-Aware MOS Prediction
Model Architecture
Training Objectives
Results and Analysis
...and 2 more sections

Figures (2)

Figure 1: Across the four quality tiers of the BVCC training set, stratified by speaker gender, the mean rating difference between male and female listeners.
Figure 2: The proposed gender-aware MOS prediction architecture. A shared SSL encoder feeds into two networks: a Mean Net predicting overall MOS, and a Gender Net predicting gender-specific MOS scores with shared projection weights.

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

TL;DR

Abstract

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (2)