Table of Contents
Fetching ...

On the social bias of speech self-supervised models

Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee

TL;DR

This work investigates embedding-level social bias in speech self-supervised learning (SSL) models and how architecture, size, training steps, and compression influence bias propagation. It evaluates HuBERT, Wav2Vec2, and MelHuBERT using SpEAT to quantify gender, age, and nationality biases, and assesses compression techniques including row pruning, head pruning, weight pruning, and distillation. The findings show SSL embeddings amplify biases relative to traditional features, with larger size not always correlating with higher bias and longer pretraining often increasing bias, particularly for gender. Among compression methods, row pruning most consistently reduces bias, while weight pruning, head pruning, and distillation exhibit mixed or limited debiasing effects, though all pruning approaches reduce Age bias. These results offer practical guidance for debiasing SSL speech representations and point toward more equitable foundational speech models with targeted compression strategies.

Abstract

Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model.

On the social bias of speech self-supervised models

TL;DR

This work investigates embedding-level social bias in speech self-supervised learning (SSL) models and how architecture, size, training steps, and compression influence bias propagation. It evaluates HuBERT, Wav2Vec2, and MelHuBERT using SpEAT to quantify gender, age, and nationality biases, and assesses compression techniques including row pruning, head pruning, weight pruning, and distillation. The findings show SSL embeddings amplify biases relative to traditional features, with larger size not always correlating with higher bias and longer pretraining often increasing bias, particularly for gender. Among compression methods, row pruning most consistently reduces bias, while weight pruning, head pruning, and distillation exhibit mixed or limited debiasing effects, though all pruning approaches reduce Age bias. These results offer practical guidance for debiasing SSL speech representations and point toward more equitable foundational speech models with targeted compression strategies.

Abstract

Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model.
Paper Structure (16 sections, 2 equations, 3 figures, 4 tables)

This paper contains 16 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Model training steps versus SpEAT effective size.
  • Figure 2: SpEAT $d$ versus parameters removed after applying 3 pruning methods on Wav2Vec2 (a, c, e) and MelHuBERT (b, d, f). The closer to the right of the graphs, the more parameters have been removed. The dashed lines are the SpEAT $d$ measured on the unpruned model.
  • Figure 3: Effect of model distillation on (a) Wav2Vec2 (b) HuBERT (c) MelHuBERT. 2, 4, 6 stands for the number of layers in the distilled model. "base" stands for the model before distillation.