Table of Contents
Fetching ...

Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks

Sayyed M. Zahiri, Jinho D. Choi

TL;DR

This work tackles text-based emotion detection in multiparty dialogue by introducing a large Friends transcripts corpus annotated with seven emotions and by proposing sequence-based CNNs with attention to leverage inter-utterance context. The SCNN models fuse current utterance representations with prior ones through concatenation, convolutional fusion, and augmented attention mechanisms, yielding improvements over a baseline CNN and RNN-CNN. Key findings show that attentive SCNN variants, particularly SCNN_c^a, achieve state-of-the-art performance on both seven-class and reduced three-class settings, albeit with challenges in long scenes and neutral dominance. The paper contributes a valuable resource and a scalable, attention-driven approach that enhances emotion detection in dialogue, with potential impact on sentiment analysis, dialogue systems, and multimedia analytics.

Abstract

While there have been significant advances in detecting emotions from speech and image recognition, emotion detection on text is still under-explored and remained as an active research field. This paper introduces a corpus for text-based emotion detection on multiparty dialogue as well as deep neural models that outperform the existing approaches for document classification. We first present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends. We then suggest four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue. Our best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained emotions, respectively. Given the difficulty of this task, this is promising.

Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks

TL;DR

This work tackles text-based emotion detection in multiparty dialogue by introducing a large Friends transcripts corpus annotated with seven emotions and by proposing sequence-based CNNs with attention to leverage inter-utterance context. The SCNN models fuse current utterance representations with prior ones through concatenation, convolutional fusion, and augmented attention mechanisms, yielding improvements over a baseline CNN and RNN-CNN. Key findings show that attentive SCNN variants, particularly SCNN_c^a, achieve state-of-the-art performance on both seven-class and reduced three-class settings, albeit with challenges in long scenes and neutral dominance. The paper contributes a valuable resource and a scalable, attention-driven approach that enhances emotion detection in dialogue, with potential impact on sentiment analysis, dialogue systems, and multimedia analytics.

Abstract

While there have been significant advances in detecting emotions from speech and image recognition, emotion detection on text is still under-explored and remained as an active research field. This paper introduces a corpus for text-based emotion detection on multiparty dialogue as well as deep neural models that outperform the existing approaches for document classification. We first present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends. We then suggest four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue. Our best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained emotions, respectively. Given the difficulty of this task, this is promising.

Paper Structure

This paper contains 22 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Emotions of the main characters within a scene. Rows correspond to the main characters' emotions and columns show the utterance number. No talking is occurred in the white regions.
  • Figure 2: Confusion matrix of corpus annotation. Each matrix cell contains the raw count.
  • Figure 3: The overview of the sequence-based CNN using concatenation (SCNN$_c$), SM: Softmax.
  • Figure 4: The SCNN$_v$ model, SM: Softmax.
  • Figure 5: The overview of SCNN$_c^a$ model.
  • ...and 3 more figures