Revisiting Emotions Representation for Recognition in the Wild

Joao Baptista Cardia Neto; Claudio Ferrari; Stefano Berretti

Revisiting Emotions Representation for Recognition in the Wild

Joao Baptista Cardia Neto, Claudio Ferrari, Stefano Berretti

TL;DR

This work reframes facial emotion recognition from single-label classification to predicting probability distributions over emotion terms by mapping Valence-Arousal-Dominance (VAD) annotations to a mixture of basic emotions using Russell’s mappings. It introduces a relabeling pipeline (yielding B-AffectNet) that balances VA-space density and estimates emotion likelihoods via Gaussian VAD models, enabling rich, nuanced ground truth for training. A baseline architecture couples CNN/self-attention features with a Likelihood head and a novel Emotion Consistency Loss that encodes semantic conflicts among emotions, improving distributional predictions over fixed labels. Human-user validation supports the alignment of automatic distributions with perceptual judgments, and experiments on 8- and 14-emotion setups demonstrate the approach’s ability to capture complex affective states in the wild, with implications for more robust, interpretable FER systems. $${ ext{JS}}$$ and $${ ext{KL}}$$ divergences, distribution distances, and mixture predictions collectively show that the proposed framework better reflects natural emotional ambiguity than traditional fixed-label methods.

Abstract

Facial emotion recognition has been typically cast as a single-label classification problem of one out of six prototypical emotions. However, that is an oversimplification that is unsuitable for representing the multifaceted spectrum of spontaneous emotional states, which are most often the result of a combination of multiple emotions contributing at different intensities. Building on this, a promising direction that was explored recently is to cast emotion recognition as a distribution learning problem. Still, such approaches are limited in that research datasets are typically annotated with a single emotion class. In this paper, we contribute a novel approach to describe complex emotional states as probability distributions over a set of emotion classes. To do so, we propose a solution to automatically re-label existing datasets by exploiting the result of a study in which a large set of both basic and compound emotions is mapped to probability distributions in the Valence-Arousal-Dominance (VAD) space. In this way, given a face image annotated with VAD values, we can estimate the likelihood of it belonging to each of the distributions, so that emotional states can be described as a mixture of emotions, enriching their description, while also accounting for the ambiguous nature of their perception. In a preliminary set of experiments, we illustrate the advantages of this solution and a new possible direction of investigation. Data annotations are available at https://github.com/jbcnrlz/affectnet-b-annotation.

Revisiting Emotions Representation for Recognition in the Wild

TL;DR

and

divergences, distribution distances, and mixture predictions collectively show that the proposed framework better reflects natural emotional ambiguity than traditional fixed-label methods.

Abstract

Paper Structure (16 sections, 16 equations, 7 figures, 2 tables)

This paper contains 16 sections, 16 equations, 7 figures, 2 tables.

Introduction
Related Work
Methodology
Recovering dominance dimension
Terms selection
Estimating the likelihood
Dataset Re-Balance
Data Validation
Data Validation Results
Model Architecture
Emotion Consistency Loss
Experiments
Training Details and Evaluation Metrics
Model Performance on 8 Emotions
Model Performance on 14 Emotions
...and 1 more sections

Figures (7)

Figure 1: Example of the intrinsic ambiguity of emotion display and perception. The three faces portray expressions that clearly reveal a combination of different emotions contributing at different intensities.
Figure 2: 2D/3D visualization of the valence and arousal space of the 7 universal emotions. Some of them severely overlap in the VA plane, thus making the mapping of a point (e.g., the X) to one or more emotion labels ambiguous. Adding the dominance dimension (bottom graph) helps disambiguate between them and improve the mapping.
Figure 3: Emotion distribution for the AffectNet dataset. Top: there is a clear imbalance towards "happy" label, while negatives have way less examples. Bottom: We added images from AffWild2 (red points) choosing them on the 4th quadrant of the VA plane so to balance negative emotions.
Figure 4: Comparison between the average likelihood across all samples of the user study and our automatic approach.
Figure 5: Some example comparison between our automatic annotations (AffectNet) and the values of the user study.
...and 2 more figures

Revisiting Emotions Representation for Recognition in the Wild

TL;DR

Abstract

Revisiting Emotions Representation for Recognition in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (7)