On the social bias of speech self-supervised models
Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee
TL;DR
This work investigates embedding-level social bias in speech self-supervised learning (SSL) models and how architecture, size, training steps, and compression influence bias propagation. It evaluates HuBERT, Wav2Vec2, and MelHuBERT using SpEAT to quantify gender, age, and nationality biases, and assesses compression techniques including row pruning, head pruning, weight pruning, and distillation. The findings show SSL embeddings amplify biases relative to traditional features, with larger size not always correlating with higher bias and longer pretraining often increasing bias, particularly for gender. Among compression methods, row pruning most consistently reduces bias, while weight pruning, head pruning, and distillation exhibit mixed or limited debiasing effects, though all pruning approaches reduce Age bias. These results offer practical guidance for debiasing SSL speech representations and point toward more equitable foundational speech models with targeted compression strategies.
Abstract
Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model.
