Table of Contents
Fetching ...

Biased Heritage: How Datasets Shape Models in Facial Expression Recognition

Iris Dominguez-Catena, Daniel Paternain, Mikel Galar, MaryBeth Defrance, Maarten Buyl, Tijl De Bie

TL;DR

FER fairness is challenged by multiclass outputs and multiple demographic groups. The authors propose a controlled bias induction framework on AffectNet and novel multiclass, multi-group bias metrics to trace bias propagation from data to models. They introduce new model-bias adaptations (TTEqOdds, CEqOdds, EqOpp, DemPar) alongside established metrics, and find stereotypical bias propagates more strongly than representational bias, while biased datasets reduce both fairness and accuracy. The work informs FER dataset design and evaluation, suggesting dataset-level mitigation can improve both fairness and performance in real-world deployments.

Abstract

In recent years, the rapid development of artificial intelligence (AI) systems has raised concerns about our ability to ensure their fairness, that is, how to avoid discrimination based on protected characteristics such as gender, race, or age. While algorithmic fairness is well-studied in simple binary classification tasks on tabular data, its application to complex, real-world scenarios-such as Facial Expression Recognition (FER)-remains underexplored. FER presents unique challenges: it is inherently multiclass, and biases emerge across intersecting demographic variables, each potentially comprising multiple protected groups. We present a comprehensive framework to analyze bias propagation from datasets to trained models in image-based FER systems, while introducing new bias metrics specifically designed for multiclass problems with multiple demographic groups. Our methodology studies bias propagation by (1) inducing controlled biases in FER datasets, (2) training models on these biased datasets, and (3) analyzing the correlation between dataset bias metrics and model fairness notions. Our findings reveal that stereotypical biases propagate more strongly to model predictions than representational biases, suggesting that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets. Additionally, we observe that biased datasets lead to reduced model accuracy, challenging the assumed fairness-accuracy trade-off.

Biased Heritage: How Datasets Shape Models in Facial Expression Recognition

TL;DR

FER fairness is challenged by multiclass outputs and multiple demographic groups. The authors propose a controlled bias induction framework on AffectNet and novel multiclass, multi-group bias metrics to trace bias propagation from data to models. They introduce new model-bias adaptations (TTEqOdds, CEqOdds, EqOpp, DemPar) alongside established metrics, and find stereotypical bias propagates more strongly than representational bias, while biased datasets reduce both fairness and accuracy. The work informs FER dataset design and evaluation, suggesting dataset-level mitigation can improve both fairness and performance in real-world deployments.

Abstract

In recent years, the rapid development of artificial intelligence (AI) systems has raised concerns about our ability to ensure their fairness, that is, how to avoid discrimination based on protected characteristics such as gender, race, or age. While algorithmic fairness is well-studied in simple binary classification tasks on tabular data, its application to complex, real-world scenarios-such as Facial Expression Recognition (FER)-remains underexplored. FER presents unique challenges: it is inherently multiclass, and biases emerge across intersecting demographic variables, each potentially comprising multiple protected groups. We present a comprehensive framework to analyze bias propagation from datasets to trained models in image-based FER systems, while introducing new bias metrics specifically designed for multiclass problems with multiple demographic groups. Our methodology studies bias propagation by (1) inducing controlled biases in FER datasets, (2) training models on these biased datasets, and (3) analyzing the correlation between dataset bias metrics and model fairness notions. Our findings reveal that stereotypical biases propagate more strongly to model predictions than representational biases, suggesting that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets. Additionally, we observe that biased datasets lead to reduced model accuracy, challenging the assumed fairness-accuracy trade-off.

Paper Structure

This paper contains 26 sections, 28 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Examples of dataset bias affecting model training. (a) shows an example of racial representational bias, where certain demographic groups are overrepresented in the training set. (b) illustrates gender stereotypical bias, where specific emotions are associated with particular demographic groups (e.g., happiness expressions predominantly from female subjects).
  • Figure 2: Summary of the methodology
  • Figure 3: Measured dataset bias according to different metrics.
  • Figure 4: Measured model bias according to different metrics.
  • Figure 5: Spearman's $\rho$ rank correlation between the dataset and model bias metrics.
  • ...and 2 more figures