Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang
TL;DR
The paper tackles biased decisions in multimodal sentiment analysis by reframing bias mitigation as a causal counterfactual problem. It introduces MCIS, a training-free framework that uses a tailored structural causal graph and two counterfactual embeddings to purify label and context biases at inference via backdoor adjustment. By subtracting purified bias effects from the factual prediction, MCIS yields unbiased inferences and demonstrates consistent improvements across MOSI and MOSEI benchmarks with reduced computational overhead. The approach is model-agnostic, compatible with diverse fusion strategies, and advances robust sentiment understanding in the presence of dataset biases. Future work includes modality reconstruction to address missing modalities while preserving debiasing advantages.
Abstract
Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a Multimodal Counterfactual Inference Sentiment (MCIS) analysis framework based on causality rather than conventional likelihood. Concretely, we first formulate a causal graph to discover harmful biases from already-trained vanilla models. In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases. Then, MCIS can make unbiased decisions from biased observations by comparing factual and counterfactual outcomes. We conduct extensive experiments on several standard MSA benchmarks. Qualitative and quantitative results show the effectiveness of the proposed framework.
