Gender Stereotyping Impact in Facial Expression Recognition

Iris Dominguez-Catena; Daniel Paternain; Mikel Galar

Gender Stereotyping Impact in Facial Expression Recognition

Iris Dominguez-Catena, Daniel Paternain, Mikel Galar

TL;DR

This paper investigates how stereotypical gender bias embedded in FER datasets affects model performance. It constructs derivative FER+ datasets with controlled gender proportions across emotion labels and assesses recall disparities between apparent gender groups using a VGG11 baseline. The study reveals recall gaps up to 29% under extreme bias, identifies a safety range where bias does not significantly alter outcomes, and emphasizes the need for bias auditing and careful data curation in FER. The findings highlight that global demographic balance can conceal category-specific biases, underscoring the importance of dataset-level bias mitigation to prevent harms in human–AI interactions.

Abstract

Facial Expression Recognition (FER) uses images of faces to identify the emotional state of users, allowing for a closer interaction between humans and autonomous systems. Unfortunately, as the images naturally integrate some demographic information, such as apparent age, gender, and race of the subject, these systems are prone to demographic bias issues. In recent years, machine learning-based models have become the most popular approach to FER. These models require training on large datasets of facial expression images, and their generalization capabilities are strongly related to the characteristics of the dataset. In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not, embedding social stereotypes into the datasets and generating a potential for harm. Although this type of bias has been overlooked so far, it is important to understand the impact it may have in the context of FER. To do so, we use a popular FER dataset, FER+, to generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels. We then proceed to measure the discrepancy between the performance of the models trained on these datasets for the apparent gender groups. We observe a discrepancy in the recognition of certain emotions between genders of up to $29 \%$ under the worst bias conditions. Our results also suggest a safety range for stereotypical bias in a dataset that does not appear to produce stereotypical bias in the resulting model. Our findings support the need for a thorough bias analysis of public datasets in problems like FER, where a global balance of demographic representation can still hide other types of bias that harm certain demographic groups.

Gender Stereotyping Impact in Facial Expression Recognition

TL;DR

Abstract

under the worst bias conditions. Our results also suggest a safety range for stereotypical bias in a dataset that does not appear to produce stereotypical bias in the resulting model. Our findings support the need for a thorough bias analysis of public datasets in problems like FER, where a global balance of demographic representation can still hide other types of bias that harm certain demographic groups.

Paper Structure (17 sections, 2 equations, 2 figures, 1 table)

This paper contains 17 sections, 2 equations, 2 figures, 1 table.

Introduction
Related work
Facial Expression Recognition
Bias
Methodology
Datasets
Demographic Relabeling
Generation of derivative datasets
Stratified subsets.
Balanced subsets.
Biased subsets.
Experiments
Experimental Setup
Results and Discussion
Dataset initial bias
...and 2 more sections

Figures (2)

Figure 1: FER+ gender distribution and support by label.
Figure 2: Recall difference Male-Female in the different emotion labels. Positive numbers mean a higher recall for the Female group than for the Male one. The baseline balanced datasets are plotted according to size, aligned with the corresponding biased datasets.

Gender Stereotyping Impact in Facial Expression Recognition

TL;DR

Abstract

Gender Stereotyping Impact in Facial Expression Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (2)