Table of Contents
Fetching ...

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

Yoshiki Tanaka, Ryuichi Uehara, Koji Inoue, Michimasa Inaba

TL;DR

A novel task named Emotion Transcription in Conversation (ETC), which focuses on generating natural language descriptions that accurately reflect speakers' emotional states within conversational contexts, and constructed a Japanese dataset comprising text-based dialogues annotated with participants's self-reported emotional states.

Abstract

Emotion Recognition in Conversation (ERC) is critical for enabling natural human-machine interactions. However, existing methods predominantly employ categorical or dimensional emotion annotations, which often fail to adequately represent complex, subtle, or culturally specific emotional nuances. To overcome this limitation, we propose a novel task named Emotion Transcription in Conversation (ETC). This task focuses on generating natural language descriptions that accurately reflect speakers' emotional states within conversational contexts. To address the ETC, we constructed a Japanese dataset comprising text-based dialogues annotated with participants' self-reported emotional states, described in natural language. The dataset also includes emotion category labels for each transcription, enabling quantitative analysis and its application to ERC. We benchmarked baseline models, finding that while fine-tuning on our dataset enhances model performance, current models still struggle to infer implicit emotional states. The ETC task will encourage further research into more expressive emotion understanding in dialogue. The dataset is publicly available at https://github.com/UEC-InabaLab/ETCDataset.

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

TL;DR

A novel task named Emotion Transcription in Conversation (ETC), which focuses on generating natural language descriptions that accurately reflect speakers' emotional states within conversational contexts, and constructed a Japanese dataset comprising text-based dialogues annotated with participants's self-reported emotional states.

Abstract

Emotion Recognition in Conversation (ERC) is critical for enabling natural human-machine interactions. However, existing methods predominantly employ categorical or dimensional emotion annotations, which often fail to adequately represent complex, subtle, or culturally specific emotional nuances. To overcome this limitation, we propose a novel task named Emotion Transcription in Conversation (ETC). This task focuses on generating natural language descriptions that accurately reflect speakers' emotional states within conversational contexts. To address the ETC, we constructed a Japanese dataset comprising text-based dialogues annotated with participants' self-reported emotional states, described in natural language. The dataset also includes emotion category labels for each transcription, enabling quantitative analysis and its application to ERC. We benchmarked baseline models, finding that while fine-tuning on our dataset enhances model performance, current models still struggle to infer implicit emotional states. The ETC task will encourage further research into more expressive emotion understanding in dialogue. The dataset is publicly available at https://github.com/UEC-InabaLab/ETCDataset.
Paper Structure (29 sections, 7 figures, 7 tables)

This paper contains 29 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Example of a dialogue with emotion transcriptions from our dataset (translated from Japanese). Each transcription is also annotated with multi-label emotion categories.
  • Figure 2: Co-occurrence matrix of emotion labels. Each cell indicates the number of transcriptions annotated with both corresponding labels.
  • Figure 3: The dialogue slot used for conducting conversations and entering emotion transcriptions.
  • Figure 4: Distribution of Big Five personality traits among the 199 participants.
  • Figure 5: The prompt template for the ETC task (Translated from Japanese). The model is expected to generate only the emotion transcription.
  • ...and 2 more figures