GReFEL: Geometry-Aware Reliable Facial Expression Learning under Bias and Imbalanced Data Distribution
Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Karlo Serbetar, Dong Kyu Chae
TL;DR
Facial expression learning (FEL) in the wild suffers from pronounced bias, data imbalance, and intra-/inter-class variation. GReFEL addresses these challenges with a geometry-aware reliability-balancing framework built on a window-based cross-attention Vision Transformer and learnable anchors, coupled with anchor- and attentive-correction paths for label refinement. The approach delivers state-of-the-art performance across multiple in-the-wild FEL datasets and is supported by comprehensive ablations that validate the contribution of reliability balancing to distribution stability and reduced mislabeling. This work offers a practical, robust pathway for deploying FEL systems with improved fairness and reliability in real-world scenarios.
Abstract
Reliable facial expression learning (FEL) involves the effective learning of distinctive facial expression characteristics for more reliable, unbiased and accurate predictions in real-life settings. However, current systems struggle with FEL tasks because of the variance in people's facial expressions due to their unique facial structures, movements, tones, and demographics. Biased and imbalanced datasets compound this challenge, leading to wrong and biased prediction labels. To tackle these, we introduce GReFEL, leveraging Vision Transformers and a facial geometry-aware anchor-based reliability balancing module to combat imbalanced data distributions, bias, and uncertainty in facial expression learning. Integrating local and global data with anchors that learn different facial data points and structural features, our approach adjusts biased and mislabeled emotions caused by intra-class disparity, inter-class similarity, and scale sensitivity, resulting in comprehensive, accurate, and reliable facial expression predictions. Our model outperforms current state-of-the-art methodologies, as demonstrated by extensive experiments on various datasets.
