Table of Contents
Fetching ...

Multi-level Conflict-Aware Network for Multi-modal Sentiment Analysis

Yubo Gao, Haotian Wu, Lei Zhang

TL;DR

This work tackles multimodal sentiment analysis by addressing not only cross-modal alignment but also conflicts between bimodal pairs. It proposes MCAN, a two-branch network where the main branch progressively fuses unimodal and bimodal representations via Micro-MSIN and Macro-MSIN, decomposing signals into aligned and conflict constituents using SVD. The conflict modeling branch (Micro-CACA and Macro-CACA) explicitly models conflicts and enforces discrepancy constraints at both representation and prediction levels, avoiding unstable label generation. Joint training of the two branches yields improved performance on CMU-MOSI and CMU-MOSEI, with ablations confirming the importance of the conflict-focused components and the singular-value truncation strategy. The approach advances robust, fine-grained multimodal fusion by explicitly balancing alignment with modality-specific conflicts, potentially improving real-world MSA systems.

Abstract

Multimodal Sentiment Analysis (MSA) aims to recognize human emotions by exploiting textual, acoustic, and visual modalities, and thus how to make full use of the interactions between different modalities is a central challenge of MSA. Interaction contains alignment and conflict aspects. Current works mainly emphasize alignment and the inherent differences between unimodal modalities, neglecting the fact that there are also potential conflicts between bimodal combinations. Additionally, multi-task learning-based conflict modeling methods often rely on the unstable generated labels. To address these challenges, we propose a novel multi-level conflict-aware network (MCAN) for multimodal sentiment analysis, which progressively segregates alignment and conflict constituents from unimodal and bimodal representations, and further exploits the conflict constituents with the conflict modeling branch. In the conflict modeling branch, we conduct discrepancy constraints at both the representation and predicted output levels, avoiding dependence on the generated labels. Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate the effectiveness of the proposed MCAN.

Multi-level Conflict-Aware Network for Multi-modal Sentiment Analysis

TL;DR

This work tackles multimodal sentiment analysis by addressing not only cross-modal alignment but also conflicts between bimodal pairs. It proposes MCAN, a two-branch network where the main branch progressively fuses unimodal and bimodal representations via Micro-MSIN and Macro-MSIN, decomposing signals into aligned and conflict constituents using SVD. The conflict modeling branch (Micro-CACA and Macro-CACA) explicitly models conflicts and enforces discrepancy constraints at both representation and prediction levels, avoiding unstable label generation. Joint training of the two branches yields improved performance on CMU-MOSI and CMU-MOSEI, with ablations confirming the importance of the conflict-focused components and the singular-value truncation strategy. The approach advances robust, fine-grained multimodal fusion by explicitly balancing alignment with modality-specific conflicts, potentially improving real-world MSA systems.

Abstract

Multimodal Sentiment Analysis (MSA) aims to recognize human emotions by exploiting textual, acoustic, and visual modalities, and thus how to make full use of the interactions between different modalities is a central challenge of MSA. Interaction contains alignment and conflict aspects. Current works mainly emphasize alignment and the inherent differences between unimodal modalities, neglecting the fact that there are also potential conflicts between bimodal combinations. Additionally, multi-task learning-based conflict modeling methods often rely on the unstable generated labels. To address these challenges, we propose a novel multi-level conflict-aware network (MCAN) for multimodal sentiment analysis, which progressively segregates alignment and conflict constituents from unimodal and bimodal representations, and further exploits the conflict constituents with the conflict modeling branch. In the conflict modeling branch, we conduct discrepancy constraints at both the representation and predicted output levels, avoiding dependence on the generated labels. Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate the effectiveness of the proposed MCAN.

Paper Structure

This paper contains 14 sections, 10 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: The overall framework of MCAN, MSIN and CACA