Table of Contents
Fetching ...

Navigating Label Ambiguity for Facial Expression Recognition in the Wild

JunGyu Lee, Yeji Choi, Haksub Kim, Ig-Jae Kim, Gi Pyo Nam

TL;DR

Facial expression recognition in the wild is hindered by label ambiguity and severe class imbalance. The paper introduces Navigating Label Ambiguity (NLA), a framework combining Noise-aware Adaptive Weighting (NAW) and consistency regularization to dynamically emphasize ambiguous samples and stabilize latent distributions, with a final objective $\mathcal{L}_{total}=\lambda\mathcal{L}_{NAW-CE}+(1-\lambda)\mathcal{L}_{reg}$. NAW uses a multivariate Gaussian over the pair $(p_i^{GT}, p_i^{NN})$ to compute $w^{*}$ and form $\mathcal{L}_{NAW-CE}=(1+w^{*})\mathcal{L}_{CE}$, with epoch-dependent means $\bm{\mu}$ and covariances $\bm{\Sigma}$ guiding emphasis on ambiguous minority samples; regularization via Jensen-Shannon Divergence enforces consistency between original and flipped views, $\mathcal{L}_{reg}$, leading to robust training. Empirically, NLA achieves state-of-the-art overall and mean accuracies on RAF-DB, FERPlus, and AffectNet, with notable improvements for minority classes and resilience to label noise and dataset imbalance. These results demonstrate the practical impact of jointly addressing label ambiguity and data bias, offering a path to more reliable FER and potential applicability to other tasks with noisy and imbalanced labeling.

Abstract

Facial expression recognition (FER) remains a challenging task due to label ambiguity caused by the subjective nature of facial expressions and noisy samples. Additionally, class imbalance, which is common in real-world datasets, further complicates FER. Although many studies have shown impressive improvements, they typically address only one of these issues, leading to suboptimal results. To tackle both challenges simultaneously, we propose a novel framework called Navigating Label Ambiguity (NLA), which is robust under real-world conditions. The motivation behind NLA is that dynamically estimating and emphasizing ambiguous samples at each iteration helps mitigate noise and class imbalance by reducing the model's bias toward majority classes. To achieve this, NLA consists of two main components: Noise-aware Adaptive Weighting (NAW) and consistency regularization. Specifically, NAW adaptively assigns higher importance to ambiguous samples and lower importance to noisy ones, based on the correlation between the intermediate prediction scores for the ground truth and the nearest negative. Moreover, we incorporate a regularization term to ensure consistent latent distributions. Consequently, NLA enables the model to progressively focus on more challenging ambiguous samples, which primarily belong to the minority class, in the later stages of training. Extensive experiments demonstrate that NLA outperforms existing methods in both overall and mean accuracy, confirming its robustness against noise and class imbalance. To the best of our knowledge, this is the first framework to address both problems simultaneously.

Navigating Label Ambiguity for Facial Expression Recognition in the Wild

TL;DR

Facial expression recognition in the wild is hindered by label ambiguity and severe class imbalance. The paper introduces Navigating Label Ambiguity (NLA), a framework combining Noise-aware Adaptive Weighting (NAW) and consistency regularization to dynamically emphasize ambiguous samples and stabilize latent distributions, with a final objective . NAW uses a multivariate Gaussian over the pair to compute and form , with epoch-dependent means and covariances guiding emphasis on ambiguous minority samples; regularization via Jensen-Shannon Divergence enforces consistency between original and flipped views, , leading to robust training. Empirically, NLA achieves state-of-the-art overall and mean accuracies on RAF-DB, FERPlus, and AffectNet, with notable improvements for minority classes and resilience to label noise and dataset imbalance. These results demonstrate the practical impact of jointly addressing label ambiguity and data bias, offering a path to more reliable FER and potential applicability to other tasks with noisy and imbalanced labeling.

Abstract

Facial expression recognition (FER) remains a challenging task due to label ambiguity caused by the subjective nature of facial expressions and noisy samples. Additionally, class imbalance, which is common in real-world datasets, further complicates FER. Although many studies have shown impressive improvements, they typically address only one of these issues, leading to suboptimal results. To tackle both challenges simultaneously, we propose a novel framework called Navigating Label Ambiguity (NLA), which is robust under real-world conditions. The motivation behind NLA is that dynamically estimating and emphasizing ambiguous samples at each iteration helps mitigate noise and class imbalance by reducing the model's bias toward majority classes. To achieve this, NLA consists of two main components: Noise-aware Adaptive Weighting (NAW) and consistency regularization. Specifically, NAW adaptively assigns higher importance to ambiguous samples and lower importance to noisy ones, based on the correlation between the intermediate prediction scores for the ground truth and the nearest negative. Moreover, we incorporate a regularization term to ensure consistent latent distributions. Consequently, NLA enables the model to progressively focus on more challenging ambiguous samples, which primarily belong to the minority class, in the later stages of training. Extensive experiments demonstrate that NLA outperforms existing methods in both overall and mean accuracy, confirming its robustness against noise and class imbalance. To the best of our knowledge, this is the first framework to address both problems simultaneously.

Paper Structure

This paper contains 23 sections, 10 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Visualization of estimating sample ambiguity. The visual analysis on the right illustrates how the correlation between the prediction scores for ground truth (GT) and nearest negative (NN) serves as a criterion for categorizing samples as clean, ambiguous, or noisy. The prediction scores for the GT and the NN are represented by green and red bars in the probability distribution on the left side, respectively (Ne: Neutral, Ha: Happiness).
  • Figure 2: The framework of Navigating Label Ambiguity (NLA). NLA consists of two main components: 1) a Noise-aware Adaptive Weighting (NAW), which dynamically assigns weights to each sample based on the intermediate prediction scores for GT and NN, and 2) consistency regularization using pairs of original and horizontally flipped images.
  • Figure 3: Visualization of the effect of NAW by prediction results. This figure illustrates how NAW enhances the model's ability to distinguish between clean, ambiguous, and noisy samples throughout the training process. (a) shows the results when the prediction is true, and (b) shows the results when the prediction is false.
  • Figure 4: Imbalanced distribution of training samples in the wild FER dataset.
  • Figure 5: Visualization of the training process of our method. This figure demonstrates how our method enhances discriminative ability by adaptively assigning weights to each sample through NAW.
  • ...and 2 more figures