Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Dingkang Yang; Mingcheng Li; Dongling Xiao; Yang Liu; Kun Yang; Zhaoyu Chen; Yuzheng Wang; Peng Zhai; Ke Li; Lihua Zhang

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang

TL;DR

The paper tackles biased decisions in multimodal sentiment analysis by reframing bias mitigation as a causal counterfactual problem. It introduces MCIS, a training-free framework that uses a tailored structural causal graph and two counterfactual embeddings to purify label and context biases at inference via backdoor adjustment. By subtracting purified bias effects from the factual prediction, MCIS yields unbiased inferences and demonstrates consistent improvements across MOSI and MOSEI benchmarks with reduced computational overhead. The approach is model-agnostic, compatible with diverse fusion strategies, and advances robust sentiment understanding in the presence of dataset biases. Future work includes modality reconstruction to address missing modalities while preserving debiasing advantages.

Abstract

Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a Multimodal Counterfactual Inference Sentiment (MCIS) analysis framework based on causality rather than conventional likelihood. Concretely, we first formulate a causal graph to discover harmful biases from already-trained vanilla models. In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases. Then, MCIS can make unbiased decisions from biased observations by comparing factual and counterfactual outcomes. We conduct extensive experiments on several standard MSA benchmarks. Qualitative and quantitative results show the effectiveness of the proposed framework.

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 6 figures, 5 tables)

This paper contains 16 sections, 8 equations, 6 figures, 5 tables.

Introduction
Related Work
Methodology
Framework Overview
Structural Causal Graph in MSA
Label Bias Purification
Context Bias Purification
Bias Elimination Strategy
Experiments
Datasets and Evaluation Metrics
Model Zoo
Implementation Details
Comparison with State-of-the-art Methods
Ablation Studies
Qualitative Analysis
...and 1 more sections

Figures (6)

Figure 1: The distribution of (a) sentiment labels and (b) several context words from the training set on the MOSI dataset zadeh2016multimodal.
Figure 2: An example of multimodal sentiment analysis. (a) Likelihood-based biased prediction from re-implemented model DMD li2023decoupled. (b) Unbiased prediction from the same model in the proposed framework. Binary classification results for illustration.
Figure 3: (a) The tailored causal graph for MSA. (b) The simplified causal graph for MSA. (c) Comparison between factual MSA and counterfactual MSA. White nodes are at the value $M = m$ while gray nodes are at the value $M = \hat{m}$ or $M = \tilde{m}$.
Figure 4: (a) The biased learning of MSA models follows the factual training. (b) The architecture of our MCIS framework. MCIS compares factual and counterfactual outcomes for different multimodal input treatments. By subtracting the label and context biases, MCIS can achieve unbiased predictions from biased observations.
Figure 5: Case study of counterfactual learning on MOSI and MOSEI. We report the binary evaluation results from the DMD li2023decoupled with our MCIS for the intuitive display. Label/Context Word Distribution: the imbalanced distribution of sentiment labels and context words in positive and negative categories comes from the training set.
...and 1 more figures

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

TL;DR

Abstract

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)