Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction
Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
TL;DR
The paper addresses the challenge of modeling temporally evolving ambiguity in continuous emotions by extending constrained neural ODEs to predict time-varying Beta-distributed arousal and valence. The proposed CD-NODE_gamma framework imposes a rate-based smoothness constraint and a range constraint to ensure valid Beta parameters, enabling end-to-end learning from speech features. Ground-truth Beta parameters are inferred from multi-rater annotations via MAP on the RECOLA dataset, and predictions are evaluated with concordance-based metrics. Results show state-of-the-art mean predictions and robust performance across ambiguity regimes, demonstrating the value of explicit temporal distribution modeling for emotion recognition.
Abstract
There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity. However, there has been comparatively less effort devoted to the consideration of temporal dependencies in emotion distributions which encodes ambiguity in perceived emotions that evolve smoothly over time. Recognizing the benefits of using constrained dynamical neural ordinary differential equations (CD-NODE) to model time series as dynamic processes, we propose an ambiguity-aware dual-constrained Neural ODE approach to model the dynamics of emotion distributions on arousal and valence. In our approach, we utilize ODEs parameterised by neural networks to estimate the distribution parameters, and we integrate additional constraints to restrict the range of the system outputs to ensure the validity of predicted distributions. We evaluated our proposed system on the publicly available RECOLA dataset and observed very promising performance across a range of evaluation metrics.
