Table of Contents
Fetching ...

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Guimin Hu, Zhihong Zhu, Daniel Hershcovich, Lijie Hu, Hasti Seifi, Jiayuan Xie

TL;DR

UniMEEC tackles the coupling of MERC and MECPE by introducing a unified multimodal causal prompt framework that treats both tasks as mask-prediction problems. The method combines Multimodal Causal Prompt (MCP) with a Task-specific Hierarchical Context (THC) to capture modality-specific cues and conversation-level dependencies, enabling cross-task learning and improved causal reasoning between emotion and its causes. Empirical results across MELD, IEMOCAP, ConvECPE, and ECF show state-of-the-art performance and robust gains in both emotion recognition and emotion-cause extraction, highlighting the practical benefits of incorporating causality into a unified model. The work underscores the potential of causal prompts and hierarchical context in advancing multimodal emotion analysis for more empathetic and context-aware dialog systems.

Abstract

Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) have recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, or situations -- known as emotion causes. Both collectively explain the causality between human emotion and intents. However, existing works treat emotion recognition and emotion cause extraction as two individual problems, ignoring their natural causality. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template. To differentiate the modal effects, UniMEEC proposes a multimodal causal prompt to probe the pre-trained knowledge specified to modality and implements cross-task and cross-modality interactions under task-oriented settings. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with the previous state-of-the-art methods.

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

TL;DR

UniMEEC tackles the coupling of MERC and MECPE by introducing a unified multimodal causal prompt framework that treats both tasks as mask-prediction problems. The method combines Multimodal Causal Prompt (MCP) with a Task-specific Hierarchical Context (THC) to capture modality-specific cues and conversation-level dependencies, enabling cross-task learning and improved causal reasoning between emotion and its causes. Empirical results across MELD, IEMOCAP, ConvECPE, and ECF show state-of-the-art performance and robust gains in both emotion recognition and emotion-cause extraction, highlighting the practical benefits of incorporating causality into a unified model. The work underscores the potential of causal prompts and hierarchical context in advancing multimodal emotion analysis for more empathetic and context-aware dialog systems.

Abstract

Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) have recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, or situations -- known as emotion causes. Both collectively explain the causality between human emotion and intents. However, existing works treat emotion recognition and emotion cause extraction as two individual problems, ignoring their natural causality. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template. To differentiate the modal effects, UniMEEC proposes a multimodal causal prompt to probe the pre-trained knowledge specified to modality and implements cross-task and cross-modality interactions under task-oriented settings. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with the previous state-of-the-art methods.
Paper Structure (24 sections, 7 equations, 2 figures, 7 tables)

This paper contains 24 sections, 7 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Illustration of the causal inference between emotion and emotion cause, which unifies MECPE and MERC tasks. "response" denotes the speaker's reaction to the event and "event" denotes the event that triggers emotion.
  • Figure 2: The overview of UniMEEC. The outputs "disgust" and "$u_{3}$" denote the emotion category and the emotion cause utterance ID of target utterance $u_{6}$, respectively.