Table of Contents
Fetching ...

Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction

Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

TL;DR

The paper addresses the challenge of modeling temporally evolving ambiguity in continuous emotions by extending constrained neural ODEs to predict time-varying Beta-distributed arousal and valence. The proposed CD-NODE_gamma framework imposes a rate-based smoothness constraint and a range constraint to ensure valid Beta parameters, enabling end-to-end learning from speech features. Ground-truth Beta parameters are inferred from multi-rater annotations via MAP on the RECOLA dataset, and predictions are evaluated with concordance-based metrics. Results show state-of-the-art mean predictions and robust performance across ambiguity regimes, demonstrating the value of explicit temporal distribution modeling for emotion recognition.

Abstract

There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity. However, there has been comparatively less effort devoted to the consideration of temporal dependencies in emotion distributions which encodes ambiguity in perceived emotions that evolve smoothly over time. Recognizing the benefits of using constrained dynamical neural ordinary differential equations (CD-NODE) to model time series as dynamic processes, we propose an ambiguity-aware dual-constrained Neural ODE approach to model the dynamics of emotion distributions on arousal and valence. In our approach, we utilize ODEs parameterised by neural networks to estimate the distribution parameters, and we integrate additional constraints to restrict the range of the system outputs to ensure the validity of predicted distributions. We evaluated our proposed system on the publicly available RECOLA dataset and observed very promising performance across a range of evaluation metrics.

Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction

TL;DR

The paper addresses the challenge of modeling temporally evolving ambiguity in continuous emotions by extending constrained neural ODEs to predict time-varying Beta-distributed arousal and valence. The proposed CD-NODE_gamma framework imposes a rate-based smoothness constraint and a range constraint to ensure valid Beta parameters, enabling end-to-end learning from speech features. Ground-truth Beta parameters are inferred from multi-rater annotations via MAP on the RECOLA dataset, and predictions are evaluated with concordance-based metrics. Results show state-of-the-art mean predictions and robust performance across ambiguity regimes, demonstrating the value of explicit temporal distribution modeling for emotion recognition.

Abstract

There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity. However, there has been comparatively less effort devoted to the consideration of temporal dependencies in emotion distributions which encodes ambiguity in perceived emotions that evolve smoothly over time. Recognizing the benefits of using constrained dynamical neural ordinary differential equations (CD-NODE) to model time series as dynamic processes, we propose an ambiguity-aware dual-constrained Neural ODE approach to model the dynamics of emotion distributions on arousal and valence. In our approach, we utilize ODEs parameterised by neural networks to estimate the distribution parameters, and we integrate additional constraints to restrict the range of the system outputs to ensure the validity of predicted distributions. We evaluated our proposed system on the publicly available RECOLA dataset and observed very promising performance across a range of evaluation metrics.
Paper Structure (14 sections, 10 equations, 4 figures, 2 tables)

This paper contains 14 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of emotion dynamics. The top pane shows six different annotations of the same speech segment over time in coloured lines, and the bottom pane shows a series of distributions reflecting the ambiguous emotion states.
  • Figure 2: Proposed CD-NODE$_\gamma$ for predicting Beta distributions. The speech features $\mathbf{x}_{t_n}$ are fed into two neural networks $f_1$ and $f_2$ which learn the dynamics of each Beta distribution. Rate constraints $\phi_i$ are applied at the outputs of neural networks and range constraints $\gamma_i$ are applied at the outputs of the ODE solvers.
  • Figure 3: RMSE of the proposed CD-NODE$_\gamma$ and baselines. The standard deviation (SD) range corresponding to each decile is shown on the x-axis.
  • Figure 4: RMSE of the proposed CD-NODE$_\gamma$, and baselines.