Automatic Emotion Modelling in Written Stories

Lukas Christ; Shahin Amiriparian; Manuel Milling; Ilhan Aslan; Björn W. Schuller

Automatic Emotion Modelling in Written Stories

Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

TL;DR

A set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories using a pretrained ELECTRA model and studying the beneﬁts of considering a sentence’s context when inferring its emotionality.

Abstract

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.

Automatic Emotion Modelling in Written Stories

TL;DR

Abstract

Paper Structure (30 sections, 6 figures, 5 tables)

This paper contains 30 sections, 6 figures, 5 tables.

Introduction
Related Works
Speech Emotion Recognition
Deep Learning Based SER
Domain Adaptation and Multi-Domain Learning
EmoSet - Collection of Speech Emotion Recognition Corpora
AD
PPMMK
TurkishEmo
Exploratory Data Analysis
Baselines
eGeMAPs
ResNet from Scratch
Deep Learning Architectures
Preprocessing
...and 15 more sections

Figures (6)

Figure 1: Boxplot of sample durations for each EmoSet corpus. Boxes show the inner-quartile range (IQR) and the whiskers extend to a maximum of $1.5\times IQR$ measured from the lower and higher quartiles. Black dots are considered as outliers. For readability, the scale of the y-axis is logarithmic from $10^{1}$ upwards.
Figure 2: Histogram of sample durations in EmoSet. Most of the samples are between 1 to $5$ seconds in length.
Figure 3: Sample Mel spectrogram images created from speech recordings of IEMOCAP for each of its four base emotion categories. From left to right: angry, happy, neutral, and sad.
Figure 4: Architecture of the base ResNet model used in the experiments for multi-corpus SER. Three convolutional stacks extract features from the generated mel-spectrogram input. A 2D attention module is then applied to reduce the variable length output of the convolutional base to a single feature vector for further processing by a MLP classifier head. $nf$ specifies the number of filters ($\#f$) of all convolutions inside a specific residual stack.
Figure 5: Depiction of a residual adapter module. The adapter is a task-specific small convolution ($1\times1$) that is applied in parallel to all convolutions of the shared base model. The outputs of both convolutions are then combined by their elementwise summation. Additionally, the subsequent BN which is not shared between corpora is shown in the figure.
...and 1 more figures

Automatic Emotion Modelling in Written Stories

TL;DR

Abstract

Automatic Emotion Modelling in Written Stories

Authors

TL;DR

Abstract

Table of Contents

Figures (6)