Table of Contents
Fetching ...

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

TL;DR

This paper introduces SemEval-2024 Task 3 on Multimodal Emotion Cause Analysis in Conversations, addressing the challenge of identifying emotional causes in conversations using text, audio, and video. It defines two subtasks, TECPE and MECPE, and releases the ECF 2.0 dataset derived from Friends to support annotation of emotion-cause pairs across modalities. The authors detail data collection, annotation, and evaluation protocols, report on participating systems and performance, and discuss biases, the role of large language models, and multimodal fusion opportunities. The work provides a benchmark, baselines, and practical insights to advance robust multimodal emotion-cause analysis for empathetic dialogue systems and automated support tools.

Abstract

The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

TL;DR

This paper introduces SemEval-2024 Task 3 on Multimodal Emotion Cause Analysis in Conversations, addressing the challenge of identifying emotional causes in conversations using text, audio, and video. It defines two subtasks, TECPE and MECPE, and releases the ECF 2.0 dataset derived from Friends to support annotation of emotion-cause pairs across modalities. The authors detail data collection, annotation, and evaluation protocols, report on participating systems and performance, and discuss biases, the role of large language models, and multimodal fusion opportunities. The work provides a benchmark, baselines, and practical insights to advance robust multimodal emotion-cause analysis for empathetic dialogue systems and automated support tools.

Abstract

The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
Paper Structure (22 sections, 3 figures, 4 tables)

This paper contains 22 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An example of our task and annotated dataset. Each arc points from the cause utterance to the emotion it triggers. The textual cause spans and the visual cause evidence are highlighted in yellow. Background: Chandler and his girlfriend Monica walked into the casino (they had a quarrel earlier but made up soon) and then started a conversation with Phoebe.
  • Figure 2: The distribution of conversation lengths. The horizontal axis represents the number of utterances, and the vertical axis represents the number of conversations.
  • Figure 3: The distribution of emotions. The horizontal axis represents the number of utterances, and the vertical axis represents emotion categories.