Table of Contents
Fetching ...

GReFEL: Geometry-Aware Reliable Facial Expression Learning under Bias and Imbalanced Data Distribution

Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Karlo Serbetar, Dong Kyu Chae

TL;DR

Facial expression learning (FEL) in the wild suffers from pronounced bias, data imbalance, and intra-/inter-class variation. GReFEL addresses these challenges with a geometry-aware reliability-balancing framework built on a window-based cross-attention Vision Transformer and learnable anchors, coupled with anchor- and attentive-correction paths for label refinement. The approach delivers state-of-the-art performance across multiple in-the-wild FEL datasets and is supported by comprehensive ablations that validate the contribution of reliability balancing to distribution stability and reduced mislabeling. This work offers a practical, robust pathway for deploying FEL systems with improved fairness and reliability in real-world scenarios.

Abstract

Reliable facial expression learning (FEL) involves the effective learning of distinctive facial expression characteristics for more reliable, unbiased and accurate predictions in real-life settings. However, current systems struggle with FEL tasks because of the variance in people's facial expressions due to their unique facial structures, movements, tones, and demographics. Biased and imbalanced datasets compound this challenge, leading to wrong and biased prediction labels. To tackle these, we introduce GReFEL, leveraging Vision Transformers and a facial geometry-aware anchor-based reliability balancing module to combat imbalanced data distributions, bias, and uncertainty in facial expression learning. Integrating local and global data with anchors that learn different facial data points and structural features, our approach adjusts biased and mislabeled emotions caused by intra-class disparity, inter-class similarity, and scale sensitivity, resulting in comprehensive, accurate, and reliable facial expression predictions. Our model outperforms current state-of-the-art methodologies, as demonstrated by extensive experiments on various datasets.

GReFEL: Geometry-Aware Reliable Facial Expression Learning under Bias and Imbalanced Data Distribution

TL;DR

Facial expression learning (FEL) in the wild suffers from pronounced bias, data imbalance, and intra-/inter-class variation. GReFEL addresses these challenges with a geometry-aware reliability-balancing framework built on a window-based cross-attention Vision Transformer and learnable anchors, coupled with anchor- and attentive-correction paths for label refinement. The approach delivers state-of-the-art performance across multiple in-the-wild FEL datasets and is supported by comprehensive ablations that validate the contribution of reliability balancing to distribution stability and reduced mislabeling. This work offers a practical, robust pathway for deploying FEL systems with improved fairness and reliability in real-world scenarios.

Abstract

Reliable facial expression learning (FEL) involves the effective learning of distinctive facial expression characteristics for more reliable, unbiased and accurate predictions in real-life settings. However, current systems struggle with FEL tasks because of the variance in people's facial expressions due to their unique facial structures, movements, tones, and demographics. Biased and imbalanced datasets compound this challenge, leading to wrong and biased prediction labels. To tackle these, we introduce GReFEL, leveraging Vision Transformers and a facial geometry-aware anchor-based reliability balancing module to combat imbalanced data distributions, bias, and uncertainty in facial expression learning. Integrating local and global data with anchors that learn different facial data points and structural features, our approach adjusts biased and mislabeled emotions caused by intra-class disparity, inter-class similarity, and scale sensitivity, resulting in comprehensive, accurate, and reliable facial expression predictions. Our model outperforms current state-of-the-art methodologies, as demonstrated by extensive experiments on various datasets.

Paper Structure

This paper contains 20 sections, 23 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Complexities of Human Emotions (Green-colored labels are true labels).
  • Figure 2: Pipeline of GReFEL. Heavy Augmentation enhances input images, while Data Refinement selects properly distributed class batches per epoch. Window-Based Cross-Attention ViT provides multi-level feature embeddings. MLP predicts primary labels, Confidence is derived from primary label distribution. Reliability balancing utilizes trainable anchors for similarity search and Multi-head self-attention for label correction and confidence calculation. A weighted average of these determines final label correction, resulting in a more reliable model.
  • Figure 3: Data flow in the Window-Based Cross-Attention ViT network
  • Figure 4: Confusion Matrix.
  • Figure 5: t-SNE visualization of Embeddings with Davies Bouldin Score ($\downarrow$) and Calinski Harabasz Score ($\uparrow$) of our model GReFEL comparing with LA-Net and SCN using Aff-Wild2 dataset containing 8 classes.
  • ...and 4 more figures