Table of Contents
Fetching ...

Modelling Emotions is an Elusive Pursuit in Affective Computing

Anders Rolighed Larsen, Sneha Das, Line Clemmensen

Abstract

Affective computing - combining sensor technology, machine learning, and psychology - have been studied for over three decades and is employed in AI-powered technologies to enhance emotional awareness in AI systems, and detect symptoms of mental health disorders such as anxiety and depression. However, the uncertainty in such systems remains high, and the application areas are limited by categorical definitions of emotions and emotional concepts. This paper argues that categorical emotion labels obscure emotional nuance in affective computing, and therefore continuous dimensional definitions are needed to advance the field, increase application usefulness, and lower uncertainties.

Modelling Emotions is an Elusive Pursuit in Affective Computing

Abstract

Affective computing - combining sensor technology, machine learning, and psychology - have been studied for over three decades and is employed in AI-powered technologies to enhance emotional awareness in AI systems, and detect symptoms of mental health disorders such as anxiety and depression. However, the uncertainty in such systems remains high, and the application areas are limited by categorical definitions of emotions and emotional concepts. This paper argues that categorical emotion labels obscure emotional nuance in affective computing, and therefore continuous dimensional definitions are needed to advance the field, increase application usefulness, and lower uncertainties.
Paper Structure (20 sections, 12 figures, 4 tables)

This paper contains 20 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Pairwise emotion distances in VAD space: theoretical ground truth vs. IEMOCAP annotator averages. Most points fall below the diagonal, indicating that annotators perceived emotion categories as more similar than dimensional theory predicts.
  • Figure 2: Cross-modal prediction alignment matrices for each modality pair, with overall agreement percentages indicated in parentheses. Left: Text vs. Audio predictions. Center: Facial vs. Audio predictions. Right: Facial vs. Text predictions. Each matrix is normalized by the number of predictions per class, by the y-axis predictor, to account for imbalanced emotion distributions across modalities. A difference in class support can be observed between modality comparisons, due to faulty data-entries to the facial modality.
  • Figure 3: Framewise emotion probabilities (colored lines) and corresponding entropy (black line) for a representative utterance. Red x's mark emotion transitions. Note the alignment of entropy spikes with transitions, indicating periods of increased affective ambiguity.
  • Figure 4: Weighted F1 scores for text, audio, and facial emotion recognition models across increasing levels of agreement-based filtering. Filtering is applied based on either categorical emotion annotation (CEA, solid lines) or VAD score coherence (VAD, dashed lines). Values and partioning parameters can be found in Appendix Figures \ref{['tab:cea_f1_scores']} and \ref{['tab:vad_f1_scores']}.
  • Figure 5: Case of modality agreement in system predictions diverging from ground truth. All models predicted happy, while annotators labeled the utterance as angry or frustrated. Examples directly taken from IEMOCAP data set Busso2008 and used under their licensing agreement.
  • ...and 7 more figures