Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware Learning
Qing Zhu, Qirong Mao, Jialin Zhang, Xiaohua Huang, Wenming Zheng
TL;DR
This work tackles group-level emotion recognition in unconstrained scenes by explicitly modeling uncertainty. It introduces Uncertainty-Aware Learning (UAL), mapping each individual to a Gaussian in latent space and using Monte Carlo sampling to generate diverse, uncertainty-informed representations ($P(z_n|x_n^I)=N(z_n;\mu_n,\sigma_n^2 I)$, with $z_n^* = \frac{1}{M}\sum_{m=1}^M(\mu_n+\epsilon_m\sigma_n)$). The model comprises three branches (face, object, scene) with an image-enhancement module, and uses a Proportional-Weighted Fusion Strategy (PWFS) to fuse branch predictions based on uncertainty-derived weights. Key contributions include uncertainty-sensitive scores for adaptive fusion, KL/rank/rec loss terms to stabilize training, a reconstruction-like penalty to curb variance oscillations, and extensive experiments across GAFF2, GAFF3, and MultiEmoVA demonstrating improved robustness and generalization. The approach advances GER by enabling robust, diverse representations under real-world uncertainties, with implications for reliable affective AI in crowded or noisy environments.
Abstract
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis, aiming to recognize an overall emotion in a multi-person scene. However, the existing methods are devoted to combing diverse emotion cues while ignoring the inherent uncertainties under unconstrained environments, such as congestion and occlusion occurring within a group. Additionally, since only group-level labels are available, inconsistent emotion predictions among individuals in one group can confuse the network. In this paper, we propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER. By explicitly modeling the uncertainty of each individual, we utilize stochastic embedding drawn from a Gaussian distribution instead of deterministic point embedding. This representation captures the probabilities of different emotions and generates diverse predictions through this stochasticity during the inference stage. Furthermore, uncertainty-sensitive scores are adaptively assigned as the fusion weights of individuals' face within each group. Moreover, we develop an image enhancement module to enhance the model's robustness against severe noise. The overall three-branch model, encompassing face, object, and scene component, is guided by a proportional-weighted fusion strategy and integrates the proposed uncertainty-aware method to produce the final group-level output. Experimental results demonstrate the effectiveness and generalization ability of our method across three widely used databases.
