BEE-NET: A deep neural network to identify in-the-wild Bodily Expression of Emotions
Mohammad Mahdi Dehshibi, David Masip
TL;DR
This work tackles automatic identification of in-the-wild bodily expressions of emotions (AIBEE) and the influence of environmental context. It introduces BEE-NET, a three-stream CNN that fuses scene/place and object cues with the emotion stream via a differentiable Bayesian-inspired late fusion (probabilistic pooling) to model joint and conditional relationships. On BoLD, BEE-NET achieves an Emotion Recognition Score (ERS) of 66.33%, surpassing prior state-of-the-art by about 2.07%, with ablations confirming the critical roles of place context and the proposed fusion scheme. The approach demonstrates that context-aware, end-to-end learning can significantly improve robustness of AIBEE for real-world applications.
Abstract
In this study, we investigate how environmental factors, specifically the scenes and objects involved, can affect the expression of emotions through body language. To this end, we introduce a novel multi-stream deep convolutional neural network named BEE-NET. We also propose a new late fusion strategy that incorporates meta-information on places and objects as prior knowledge in the learning process. Our proposed probabilistic pooling model leverages this information to generate a joint probability distribution of both available and anticipated non-available contextual information in latent space. Importantly, our fusion strategy is differentiable, allowing for end-to-end training and capturing of hidden associations among data points without requiring further post-processing or regularisation. To evaluate our deep model, we use the Body Language Database (BoLD), which is currently the largest available database for the Automatic Identification of the in-the-wild Bodily Expression of Emotions (AIBEE). Our experimental results demonstrate that our proposed approach surpasses the current state-of-the-art in AIBEE by a margin of 2.07%, achieving an Emotional Recognition Score of 66.33%.
