Multimodal Sentiment Analysis Based on Causal Reasoning
Fuhai Chen, Pengpeng Huang, Xuri Ge, Jie Huang, Zishuo Bao
TL;DR
This paper tackles modality bias in multimodal sentiment analysis by introducing CounterFactual Multimodal Sentiment Analysis (CF-MSA), which leverages causal counterfactual reasoning to separate direct modality effects from joint multimodal signals. It formalizes a cause-effect model with a mediator and defines the total, direct, and indirect effects ($TE$, $NDE$, $TIE$) to guide debiasing, then implements CF-MSA as three branches (text, image, text-image synthesis) fused by a learned function and optimized with a novel intermodal bias loss. Experimental results on MVSA-Single and MVSA-Multiple show that CF-MSA achieves debiasing and state-of-the-art performance under various bias-removal conditions, with ablations validating the new objective $\\mathcal{L}_{ti}$ and the importance of non-uniform distributions for the learnable parameter $c$. The work provides a generalizable framework for debiased multimodal inference and offers open-source code and datasets to facilitate future research and application in practical sentiment analysis tasks.
Abstract
With the rapid development of multimedia, the shift from unimodal textual sentiment analysis to multimodal image-text sentiment analysis has obtained academic and industrial attention in recent years. However, multimodal sentiment analysis is affected by unimodal data bias, e.g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classification. In this paper, we propose a novel CounterFactual Multimodal Sentiment Analysis framework (CF-MSA) using causal counterfactual inference to construct multimodal sentiment causal inference. CF-MSA mitigates the direct effect from unimodal bias and ensures heterogeneity across modalities by differentiating the treatment variables between modalities. In addition, considering the information complementarity and bias differences between modalities, we propose a new optimisation objective to effectively integrate different modalities and reduce the inherent bias from each modality. Experimental results on two public datasets, MVSA-Single and MVSA-Multiple, demonstrate that the proposed CF-MSA has superior debiasing capability and achieves new state-of-the-art performances. We will release the code and datasets to facilitate future research.
