Table of Contents
Fetching ...

Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

TL;DR

This work introduces a continuous valence/arousal framework for modeling emotional trajectories across complete stories, using a DeBERTaV3 backbone fine-tuned with context-aware inputs and a weakly supervised learning pipeline. Discrete emotion annotations from the Alm children's story corpus are mapped to the V/A space via NRC-VAD, and a gold standard is constructed with Evaluator-Weighted Estimation to produce per-sentence targets. The authors show that incorporating contextual windows and in-domain unlabeled data improves CCC-based predictions, achieving a test valence CCC of 0.8221 and arousal CCC of 0.7125, with results showing strong dependence on context, author style, and story position. While promising, the study also reveals limitations in event-role understanding and global story tone, motivating future work on holistic narration modeling and personalization with larger LLMs.

Abstract

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.

Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

TL;DR

This work introduces a continuous valence/arousal framework for modeling emotional trajectories across complete stories, using a DeBERTaV3 backbone fine-tuned with context-aware inputs and a weakly supervised learning pipeline. Discrete emotion annotations from the Alm children's story corpus are mapped to the V/A space via NRC-VAD, and a gold standard is constructed with Evaluator-Weighted Estimation to produce per-sentence targets. The authors show that incorporating contextual windows and in-domain unlabeled data improves CCC-based predictions, achieving a test valence CCC of 0.8221 and arousal CCC of 0.7125, with results showing strong dependence on context, author style, and story position. While promising, the study also reveals limitations in event-role understanding and global story tone, motivating future work on holistic narration modeling and personalization with larger LLMs.

Abstract

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of for valence and for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.
Paper Structure (23 sections, 1 equation, 6 figures, 21 tables)

This paper contains 23 sections, 1 equation, 6 figures, 21 tables.

Figures (6)

  • Figure 1: Confusion matrices comparing different annotators' (A1, A2, A3) labels for the whole dataset. Note that for annotator 3, positive and negative surprise were not available.
  • Figure 2: Exemplary mapping from the three annotators' (A1, A2, A3) discrete annotations (top) to their respective valence (middle) and arousal (bottom) signals and the gold standard signals created via EWE (solid red lines). The annotations are taken from the story Ashputtel by the Grimm brothers, consisting of 102 sentences.
  • Figure 3: Example for the finetuning approach with context size $\mathcal{C}=2$. Valence (V) and arousal (A) predictions are obtained for all sentences at once.
  • Figure 4: Illustration of our training steps and corpora. FT is short for finetuned.
  • Figure 5: Screenshot of the annotation tool. First, the whole story must be read. Upon confirmation ("Continue"), annotation of the individual sentences follows.
  • ...and 1 more figures