Revisiting Emotions Representation for Recognition in the Wild
Joao Baptista Cardia Neto, Claudio Ferrari, Stefano Berretti
TL;DR
This work reframes facial emotion recognition from single-label classification to predicting probability distributions over emotion terms by mapping Valence-Arousal-Dominance (VAD) annotations to a mixture of basic emotions using Russell’s mappings. It introduces a relabeling pipeline (yielding B-AffectNet) that balances VA-space density and estimates emotion likelihoods via Gaussian VAD models, enabling rich, nuanced ground truth for training. A baseline architecture couples CNN/self-attention features with a Likelihood head and a novel Emotion Consistency Loss that encodes semantic conflicts among emotions, improving distributional predictions over fixed labels. Human-user validation supports the alignment of automatic distributions with perceptual judgments, and experiments on 8- and 14-emotion setups demonstrate the approach’s ability to capture complex affective states in the wild, with implications for more robust, interpretable FER systems. $${ ext{JS}}$$ and $${ ext{KL}}$$ divergences, distribution distances, and mixture predictions collectively show that the proposed framework better reflects natural emotional ambiguity than traditional fixed-label methods.
Abstract
Facial emotion recognition has been typically cast as a single-label classification problem of one out of six prototypical emotions. However, that is an oversimplification that is unsuitable for representing the multifaceted spectrum of spontaneous emotional states, which are most often the result of a combination of multiple emotions contributing at different intensities. Building on this, a promising direction that was explored recently is to cast emotion recognition as a distribution learning problem. Still, such approaches are limited in that research datasets are typically annotated with a single emotion class. In this paper, we contribute a novel approach to describe complex emotional states as probability distributions over a set of emotion classes. To do so, we propose a solution to automatically re-label existing datasets by exploiting the result of a study in which a large set of both basic and compound emotions is mapped to probability distributions in the Valence-Arousal-Dominance (VAD) space. In this way, given a face image annotated with VAD values, we can estimate the likelihood of it belonging to each of the distributions, so that emotional states can be described as a mixture of emotions, enriching their description, while also accounting for the ambiguous nature of their perception. In a preliminary set of experiments, we illustrate the advantages of this solution and a new possible direction of investigation. Data annotations are available at https://github.com/jbcnrlz/affectnet-b-annotation.
