Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks
Sayyed M. Zahiri, Jinho D. Choi
TL;DR
This work tackles text-based emotion detection in multiparty dialogue by introducing a large Friends transcripts corpus annotated with seven emotions and by proposing sequence-based CNNs with attention to leverage inter-utterance context. The SCNN models fuse current utterance representations with prior ones through concatenation, convolutional fusion, and augmented attention mechanisms, yielding improvements over a baseline CNN and RNN-CNN. Key findings show that attentive SCNN variants, particularly SCNN_c^a, achieve state-of-the-art performance on both seven-class and reduced three-class settings, albeit with challenges in long scenes and neutral dominance. The paper contributes a valuable resource and a scalable, attention-driven approach that enhances emotion detection in dialogue, with potential impact on sentiment analysis, dialogue systems, and multimedia analytics.
Abstract
While there have been significant advances in detecting emotions from speech and image recognition, emotion detection on text is still under-explored and remained as an active research field. This paper introduces a corpus for text-based emotion detection on multiparty dialogue as well as deep neural models that outperform the existing approaches for document classification. We first present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends. We then suggest four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue. Our best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained emotions, respectively. Given the difficulty of this task, this is promising.
