Table of Contents
Fetching ...

Is It Still Fair? Investigating Gender Fairness in Cross-Corpus Speech Emotion Recognition

Shreya G. Upadhyay, Woan-Shiuan Chien, Chi-Chun Lee

TL;DR

This work investigates gender fairness generalizability in cross-corpus speech emotion recognition (SER) and reveals that models fair within a source corpus can exhibit gender biases when transferred to a different corpus. It introduces Combined Fairness Adaptation (CFA), an architecture coupling an emotion-classification block with a fairness-adaptation component that adversarially neutralizes gender information while aligning source and target representations via a contrastive loss. Empirical results on MSP-Podcast and BIIC-Podcast show that CFA improves gender fairness metrics ($\Delta SP$, $\Delta EO$) on the target corpus and, in phonetic-aware variants, maintains competitive emotion recognition performance. The findings highlight the importance of integrating fairness objectives into cross-corpus SER and lay groundwork for more robust, gender-fair emotion understanding across diverse linguistic domains.

Abstract

Speech emotion recognition (SER) is a vital component in various everyday applications. Cross-corpus SER models are increasingly recognized for their ability to generalize performance. However, concerns arise regarding fairness across demographics in diverse corpora. Existing fairness research often focuses solely on corpus-specific fairness, neglecting its generalizability in cross-corpus scenarios. Our study focuses on this underexplored area, examining the gender fairness generalizability in cross-corpus SER scenarios. We emphasize that the performance of cross-corpus SER models and their fairness are two distinct considerations. Moreover, we propose the approach of a combined fairness adaptation mechanism to enhance gender fairness in the SER transfer learning tasks by addressing both source and target genders. Our findings bring one of the first insights into the generalizability of gender fairness in cross-corpus SER systems.

Is It Still Fair? Investigating Gender Fairness in Cross-Corpus Speech Emotion Recognition

TL;DR

This work investigates gender fairness generalizability in cross-corpus speech emotion recognition (SER) and reveals that models fair within a source corpus can exhibit gender biases when transferred to a different corpus. It introduces Combined Fairness Adaptation (CFA), an architecture coupling an emotion-classification block with a fairness-adaptation component that adversarially neutralizes gender information while aligning source and target representations via a contrastive loss. Empirical results on MSP-Podcast and BIIC-Podcast show that CFA improves gender fairness metrics (, ) on the target corpus and, in phonetic-aware variants, maintains competitive emotion recognition performance. The findings highlight the importance of integrating fairness objectives into cross-corpus SER and lay groundwork for more robust, gender-fair emotion understanding across diverse linguistic domains.

Abstract

Speech emotion recognition (SER) is a vital component in various everyday applications. Cross-corpus SER models are increasingly recognized for their ability to generalize performance. However, concerns arise regarding fairness across demographics in diverse corpora. Existing fairness research often focuses solely on corpus-specific fairness, neglecting its generalizability in cross-corpus scenarios. Our study focuses on this underexplored area, examining the gender fairness generalizability in cross-corpus SER scenarios. We emphasize that the performance of cross-corpus SER models and their fairness are two distinct considerations. Moreover, we propose the approach of a combined fairness adaptation mechanism to enhance gender fairness in the SER transfer learning tasks by addressing both source and target genders. Our findings bring one of the first insights into the generalizability of gender fairness in cross-corpus SER systems.
Paper Structure (8 sections, 5 equations, 3 figures, 3 tables)

This paper contains 8 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Proposed (CFA) SER approach using gender-neutral adaptation mechanism for fair cross-corpus SER.
  • Figure 2: t-SNE plot for Anger features.
  • Figure 3: Gender detection (GD) accuracy plot using PA-ReW and PA-CFA model embeddings.