On the social bias of speech self-supervised models

Yi-Cheng Lin; Tzu-Quan Lin; Hsi-Che Lin; Andy T. Liu; Hung-yi Lee

On the social bias of speech self-supervised models

Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee

TL;DR

This work investigates embedding-level social bias in speech self-supervised learning (SSL) models and how architecture, size, training steps, and compression influence bias propagation. It evaluates HuBERT, Wav2Vec2, and MelHuBERT using SpEAT to quantify gender, age, and nationality biases, and assesses compression techniques including row pruning, head pruning, weight pruning, and distillation. The findings show SSL embeddings amplify biases relative to traditional features, with larger size not always correlating with higher bias and longer pretraining often increasing bias, particularly for gender. Among compression methods, row pruning most consistently reduces bias, while weight pruning, head pruning, and distillation exhibit mixed or limited debiasing effects, though all pruning approaches reduce Age bias. These results offer practical guidance for debiasing SSL speech representations and point toward more equitable foundational speech models with targeted compression strategies.

Abstract

Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model.

On the social bias of speech self-supervised models

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 3 figures, 4 tables)

This paper contains 16 sections, 2 equations, 3 figures, 4 tables.

Introduction
Experiment setup
Speech Self-supervised models
Bias evaluation for Speech SSL models
Compression methods for Speech SSL models
Result and Analysis
Effect of model architecture on bias
Effect of training steps on bias
Effect of compression on bias
Head pruning
Row pruning
Weight pruning
Distillation
Conclusion
Limitation
...and 1 more sections

Figures (3)

Figure 1: Model training steps versus SpEAT effective size.
Figure 2: SpEAT $d$ versus parameters removed after applying 3 pruning methods on Wav2Vec2 (a, c, e) and MelHuBERT (b, d, f). The closer to the right of the graphs, the more parameters have been removed. The dashed lines are the SpEAT $d$ measured on the unpruned model.
Figure 3: Effect of model distillation on (a) Wav2Vec2 (b) HuBERT (c) MelHuBERT. 2, 4, 6 stands for the number of layers in the distilled model. "base" stands for the model before distillation.

On the social bias of speech self-supervised models

TL;DR

Abstract

On the social bias of speech self-supervised models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)