Table of Contents
Fetching ...

Unseen Fake News Detection Through Casual Debiasing

Shuzhi Gong, Richard Sinnott, Jianzhong Qi, Cecile Paris

TL;DR

The paper addresses unseen fake news detection under cross-domain distribution shifts by introducing FNDCD, a causal-debiasing framework that identifies environment-biased training samples through a posterior $p_{\theta}(e|\mathbf{A},\mathbf{X},\mathbf{y})$ and down-weights them during training. It combines a RoBERTa-based content encoder, a two-layer GCN propagation encoder, a Structure Estimator for edges, and posterior inference to obtain an ELBO objective $\mathcal{L}_{ELBO}=\mathcal{L}_{cl}+\mathcal{L}_{reg}+\mathcal{L}_{KL}$, enabling environment-aware reweighting of the loss. Training optimizes this objective within an EM framework, while testing uses $e=1$ (environment-independent) to infer $p(y|\mathbf{X},\mathbf{A},e=1)$. Empirical results on four non-overlapping-domain datasets (including cross-language settings) show that FNDCD achieves state-of-the-art performance in unseen-domain fake news detection and provides interpretability regarding data bias. The work offers a practical path to robust, cross-domain fake news detection by mitigating domain-specific biases without requiring labeled target-domain data during training.

Abstract

The widespread dissemination of fake news on social media poses significant risks, necessitating timely and accurate detection. However, existing methods struggle with unseen news due to their reliance on training data from past events and domains, leaving the challenge of detecting novel fake news largely unresolved. To address this, we identify biases in training data tied to specific domains and propose a debiasing solution FNDCD. Originating from causal analysis, FNDCD employs a reweighting strategy based on classification confidence and propagation structure regularization to reduce the influence of domain-specific biases, enhancing the detection of unseen fake news. Experiments on real-world datasets with non-overlapping news domains demonstrate FNDCD's effectiveness in improving generalization across domains.

Unseen Fake News Detection Through Casual Debiasing

TL;DR

The paper addresses unseen fake news detection under cross-domain distribution shifts by introducing FNDCD, a causal-debiasing framework that identifies environment-biased training samples through a posterior and down-weights them during training. It combines a RoBERTa-based content encoder, a two-layer GCN propagation encoder, a Structure Estimator for edges, and posterior inference to obtain an ELBO objective , enabling environment-aware reweighting of the loss. Training optimizes this objective within an EM framework, while testing uses (environment-independent) to infer . Empirical results on four non-overlapping-domain datasets (including cross-language settings) show that FNDCD achieves state-of-the-art performance in unseen-domain fake news detection and provides interpretability regarding data bias. The work offers a practical path to robust, cross-domain fake news detection by mitigating domain-specific biases without requiring labeled target-domain data during training.

Abstract

The widespread dissemination of fake news on social media poses significant risks, necessitating timely and accurate detection. However, existing methods struggle with unseen news due to their reliance on training data from past events and domains, leaving the challenge of detecting novel fake news largely unresolved. To address this, we identify biases in training data tied to specific domains and propose a debiasing solution FNDCD. Originating from causal analysis, FNDCD employs a reweighting strategy based on classification confidence and propagation structure regularization to reduce the influence of domain-specific biases, enhancing the detection of unseen fake news. Experiments on real-world datasets with non-overlapping news domains demonstrate FNDCD's effectiveness in improving generalization across domains.

Paper Structure

This paper contains 6 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Structure of the causal model used for training cross-domain fake news detection. $C$: Causal information that supports the correct classification; $E$: Spurious environment-biased information harming the classification; $G$: Observed graph features; $Y$: Associated veracity label. The grey and white variables represent the degree of observability (unobserved is grey and observed is white).
  • Figure 2: The structure of FNDCD. R is the loss reweight module according to the inferred environment variable $\mathbf{e}$.
  • Figure 3: Distribution of inferred environment variable (left: source Twitter dataset, right: source Weibo dataset).
  • Figure 4: Parameter $p(e)$ sensitivity.