Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Shen Zhang; Haojie Zhang; Jing Zhang; Xudong Zhang; Yimeng Zhuang; Jinting Wu

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu

TL;DR

This work tackles Multimodal Emotion-Cause Pair Extraction in Conversations (ECPEC) by proposing a three-stage pipeline: ERC with InstructERC to label utterance emotions, TSAM to extract emotion-cause pairs conditioned on target emotions, and MuTEC for end-to-end causal span extraction. The approach integrates auxiliary tasks, a hierarchical emotion-label scheme, and multimodal cues (audio/video) to enhance both emotion recognition and causal analysis. Empirical results show leading performance on both subtasks, with ablation studies confirming the contributions of instructions, MTLA, and model ensembles, as well as insights into when multimodal fusion helps or hinders. The work demonstrates that combining generative ERC, causal-entailment modeling, and multimodal information can effectively reveal emotion causes in conversations, offering practical benefits for more empathetic and context-aware AI systems. $L_{Loss} = L_{CSE} + eta L_{Emotion}$ is used to jointly train emotion prediction and causal span tasks, illustrating the value of end-to-end optimization in this domain.

Abstract

In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition.

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

TL;DR

is used to jointly train emotion prediction and causal span tasks, illustrating the value of end-to-end optimization in this domain.

Abstract

Paper Structure (35 sections, 4 equations, 4 figures, 3 tables)

This paper contains 35 sections, 4 equations, 4 figures, 3 tables.

Introduction
Related Works
Emotion Recognition in Conversation
Emotion Causes in Conversations
Causal Emotion Entailment
Causal Span Extraction
System Overview
System Architecture
Emotion Recognition in Conversations
InstructERC for Emotion Recognition
Hierarchical Emotion Label
Auxiliary Tasks and Instruct Design
Emotion Cause Span Extraction
Emotion-Cause Pair Extraction
TSAM Model
...and 20 more sections

Figures (4)

Figure 1: The overview of proposed model framework.
Figure 2: The Hierarchical Structure of Emotion labels.
Figure 3: The Schematic of Instruct Template for ERC.
Figure 4: The framework of the face module.

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

TL;DR

Abstract

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)