Table of Contents
Fetching ...

Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention

Moyang Liu, Kaiying Yan, Yukun Liu, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Chenxing Li

TL;DR

The paper tackles multimodal fake news detection by modeling three cross-modal confounders with a unified Structural Causal Model and introducing CIMDD, a framework that uses do-calculus-based interventions to block spurious correlations. It features three dedicated modules—Linguistic Backdoor Deconfounded Reasoning (LBDR), Visual Frontdoor Deconfounded Reasoning (VFDR), and Cross-modal Joint Deconfounded Reasoning (CJDR)—and leverages NWGM to approximate causal expectations like $P(Y|do(X))$ and related quantities. Empirical results on FakeSV and FVC show substantial accuracy gains (up to $+4.27\%$ and $+4.80\%$), with ablations confirming the contributions of each module and demonstrating robust generalization across diverse multimodal scenarios. The approach advances deconfounded reasoning in multimodal settings and offers a principled pathway to mitigate cross-modal confounding in practice.

Abstract

The rapid growth of social media has led to the widespread dissemination of fake news across multiple content forms, including text, images, audio, and video. Traditional unimodal detection methods fall short in addressing complex cross-modal manipulations; as a result, multimodal fake news detection has emerged as a more effective solution. However, existing multimodal approaches, especially in the context of fake news detection on social media, often overlook the confounders hidden within complex cross-modal interactions, leading models to rely on spurious statistical correlations rather than genuine causal mechanisms. In this paper, we propose the Causal Intervention-based Multimodal Deconfounded Detection (CIMDD) framework, which systematically models three types of confounders via a unified Structural Causal Model (SCM): (1) Lexical Semantic Confounder (LSC); (2) Latent Visual Confounder (LVC); (3) Dynamic Cross-Modal Coupling Confounder (DCCC). To mitigate the influence of these confounders, we specifically design three causal modules based on backdoor adjustment, frontdoor adjustment, and cross-modal joint intervention to block spurious correlations from different perspectives and achieve causal disentanglement of representations for deconfounded reasoning. Experimental results on the FakeSV and FVC datasets demonstrate that CIMDD significantly improves detection accuracy, outperforming state-of-the-art methods by 4.27% and 4.80%, respectively. Furthermore, extensive experimental results indicate that CIMDD exhibits strong generalization and robustness across diverse multimodal scenarios.

Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention

TL;DR

The paper tackles multimodal fake news detection by modeling three cross-modal confounders with a unified Structural Causal Model and introducing CIMDD, a framework that uses do-calculus-based interventions to block spurious correlations. It features three dedicated modules—Linguistic Backdoor Deconfounded Reasoning (LBDR), Visual Frontdoor Deconfounded Reasoning (VFDR), and Cross-modal Joint Deconfounded Reasoning (CJDR)—and leverages NWGM to approximate causal expectations like and related quantities. Empirical results on FakeSV and FVC show substantial accuracy gains (up to and ), with ablations confirming the contributions of each module and demonstrating robust generalization across diverse multimodal scenarios. The approach advances deconfounded reasoning in multimodal settings and offers a principled pathway to mitigate cross-modal confounding in practice.

Abstract

The rapid growth of social media has led to the widespread dissemination of fake news across multiple content forms, including text, images, audio, and video. Traditional unimodal detection methods fall short in addressing complex cross-modal manipulations; as a result, multimodal fake news detection has emerged as a more effective solution. However, existing multimodal approaches, especially in the context of fake news detection on social media, often overlook the confounders hidden within complex cross-modal interactions, leading models to rely on spurious statistical correlations rather than genuine causal mechanisms. In this paper, we propose the Causal Intervention-based Multimodal Deconfounded Detection (CIMDD) framework, which systematically models three types of confounders via a unified Structural Causal Model (SCM): (1) Lexical Semantic Confounder (LSC); (2) Latent Visual Confounder (LVC); (3) Dynamic Cross-Modal Coupling Confounder (DCCC). To mitigate the influence of these confounders, we specifically design three causal modules based on backdoor adjustment, frontdoor adjustment, and cross-modal joint intervention to block spurious correlations from different perspectives and achieve causal disentanglement of representations for deconfounded reasoning. Experimental results on the FakeSV and FVC datasets demonstrate that CIMDD significantly improves detection accuracy, outperforming state-of-the-art methods by 4.27% and 4.80%, respectively. Furthermore, extensive experimental results indicate that CIMDD exhibits strong generalization and robustness across diverse multimodal scenarios.

Paper Structure

This paper contains 20 sections, 20 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed three types of confounders (LSC, LVC, DCCC) in the context of multimodal fake news detection on social media.
  • Figure 2: Illustration of the structural causal model for multimodal fake news detection in the context of social media.
  • Figure 3: Architecture of the Causal Intervention-based Multimodal Deconfounded Detection framework CIMDD.
  • Figure 4: Illustration of causal intervention to block backdoor paths and eliminate confounding bias.
  • Figure 5: The specific deconfounding processes of the three causal modules (LBDR, VFDR, CJDR).
  • ...and 1 more figures