Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning
Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller
TL;DR
This work introduces a continuous valence/arousal framework for modeling emotional trajectories across complete stories, using a DeBERTaV3 backbone fine-tuned with context-aware inputs and a weakly supervised learning pipeline. Discrete emotion annotations from the Alm children's story corpus are mapped to the V/A space via NRC-VAD, and a gold standard is constructed with Evaluator-Weighted Estimation to produce per-sentence targets. The authors show that incorporating contextual windows and in-domain unlabeled data improves CCC-based predictions, achieving a test valence CCC of 0.8221 and arousal CCC of 0.7125, with results showing strong dependence on context, author style, and story position. While promising, the study also reveals limitations in event-role understanding and global story tone, motivating future work on holistic narration modeling and personalization with larger LLMs.
Abstract
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.
