Table of Contents
Fetching ...

Balancing the Scales: Enhancing Fairness in Facial Expression Recognition with Latent Alignment

Syed Sameen Ahmad Rizvi, Aryan Seth, Pratik Narang

TL;DR

This work leverages representation learning based on latent spaces to mitigate bias in facial expression recognition systems, thereby enhancing a deep learning model's fairness and overall accuracy.

Abstract

Automatically recognizing emotional intent using facial expression has been a thoroughly investigated topic in the realm of computer vision. Facial Expression Recognition (FER), being a supervised learning task, relies heavily on substantially large data exemplifying various socio-cultural demographic attributes. Over the past decade, several real-world in-the-wild FER datasets that have been proposed were collected through crowd-sourcing or web-scraping. However, most of these practically used datasets employ a manual annotation methodology for labeling emotional intent, which inherently propagates individual demographic biases. Moreover, these datasets also lack an equitable representation of various socio-cultural demographic groups, thereby inducing a class imbalance. Bias analysis and its mitigation have been investigated across multiple domains and problem settings, however, in the FER domain, this is a relatively lesser explored area. This work leverages representation learning based on latent spaces to mitigate bias in facial expression recognition systems, thereby enhancing a deep learning model's fairness and overall accuracy.

Balancing the Scales: Enhancing Fairness in Facial Expression Recognition with Latent Alignment

TL;DR

This work leverages representation learning based on latent spaces to mitigate bias in facial expression recognition systems, thereby enhancing a deep learning model's fairness and overall accuracy.

Abstract

Automatically recognizing emotional intent using facial expression has been a thoroughly investigated topic in the realm of computer vision. Facial Expression Recognition (FER), being a supervised learning task, relies heavily on substantially large data exemplifying various socio-cultural demographic attributes. Over the past decade, several real-world in-the-wild FER datasets that have been proposed were collected through crowd-sourcing or web-scraping. However, most of these practically used datasets employ a manual annotation methodology for labeling emotional intent, which inherently propagates individual demographic biases. Moreover, these datasets also lack an equitable representation of various socio-cultural demographic groups, thereby inducing a class imbalance. Bias analysis and its mitigation have been investigated across multiple domains and problem settings, however, in the FER domain, this is a relatively lesser explored area. This work leverages representation learning based on latent spaces to mitigate bias in facial expression recognition systems, thereby enhancing a deep learning model's fairness and overall accuracy.

Paper Structure

This paper contains 15 sections, 4 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Architecture for Attribute Disentanglement. $L_i$ represents data having the attribute $q_i$. $Z_{L_i}$ is the latent representation of $L_i$. $E_{L_i}$ is a VAE with shared weights $\forall i$. 'E' refers to the Encoder module, which compresses the input image into a latent that does not contain information about the protected attribute. 'G' refers to the Generator, which is a reconstruction module that converts the latent back to the original image.
  • Figure 2: Classification backbone uses the latent representation generated by the encoder to classify into the 7 emotions.
  • Figure 3: Data Distribution of the test test of RAF-DB. (a) represents the gender-wise distribution, (b) represents the age group distribution, and (c) represents the ethnic distribution of the test set of RAF-DB.