Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations
Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu
TL;DR
This work tackles Multimodal Emotion-Cause Pair Extraction in Conversations (ECPEC) by proposing a three-stage pipeline: ERC with InstructERC to label utterance emotions, TSAM to extract emotion-cause pairs conditioned on target emotions, and MuTEC for end-to-end causal span extraction. The approach integrates auxiliary tasks, a hierarchical emotion-label scheme, and multimodal cues (audio/video) to enhance both emotion recognition and causal analysis. Empirical results show leading performance on both subtasks, with ablation studies confirming the contributions of instructions, MTLA, and model ensembles, as well as insights into when multimodal fusion helps or hinders. The work demonstrates that combining generative ERC, causal-entailment modeling, and multimodal information can effectively reveal emotion causes in conversations, offering practical benefits for more empathetic and context-aware AI systems. $L_{Loss} = L_{CSE} + eta L_{Emotion}$ is used to jointly train emotion prediction and causal span tasks, illustrating the value of end-to-end optimization in this domain.
Abstract
In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition.
