Table of Contents
Fetching ...

Graph with Sequence: Broad-Range Semantic Modeling for Fake News Detection

Junwei Yin, Min Gao, Kai Shu, Wentao Li, Yinqiu Huang, Zongwei Wang

TL;DR

BREAK tackles fake news detection by modeling broad-range semantics with a fully connected sentence graph while mitigating two noise types via dual denoising in a bi-level optimization. The inner module uses a sequence-based lower bound to refine the graph structure, producing a denoised representation, while the outer module aligns graph- and sequence-derived features through KL-divergence to yield $E_{str}$ and $E_{seq}$ that support robust detection. Empirical results on four real-world datasets show BREAK achieving state-of-the-art performance and strong generalization to evidence-enabled settings, outperforming baselines by several percentage points in F1. This approach offers a scalable, content-only framework that effectively captures long-range semantic interrelations for fake news detection with practical resilience to noise and varying article lengths.

Abstract

The rapid proliferation of fake news on social media threatens social stability, creating an urgent demand for more effective detection methods. While many promising approaches have emerged, most rely on content analysis with limited semantic depth, leading to suboptimal comprehension of news content.To address this limitation, capturing broader-range semantics is essential yet challenging, as it introduces two primary types of noise: fully connecting sentences in news graphs often adds unnecessary structural noise, while highly similar but authenticity-irrelevant sentences introduce feature noise, complicating the detection process. To tackle these issues, we propose BREAK, a broad-range semantics model for fake news detection that leverages a fully connected graph to capture comprehensive semantics while employing dual denoising modules to minimize both structural and feature noise. The semantic structure denoising module balances the graph's connectivity by iteratively refining it between two bounds: a sequence-based structure as a lower bound and a fully connected graph as the upper bound. This refinement uncovers label-relevant semantic interrelations structures. Meanwhile, the semantic feature denoising module reduces noise from similar semantics by diversifying representations, aligning distinct outputs from the denoised graph and sequence encoders using KL-divergence to achieve feature diversification in high-dimensional space. The two modules are jointly optimized in a bi-level framework, enhancing the integration of denoised semantics into a comprehensive representation for detection. Extensive experiments across four datasets demonstrate that BREAK significantly outperforms existing fake news detection methods.

Graph with Sequence: Broad-Range Semantic Modeling for Fake News Detection

TL;DR

BREAK tackles fake news detection by modeling broad-range semantics with a fully connected sentence graph while mitigating two noise types via dual denoising in a bi-level optimization. The inner module uses a sequence-based lower bound to refine the graph structure, producing a denoised representation, while the outer module aligns graph- and sequence-derived features through KL-divergence to yield and that support robust detection. Empirical results on four real-world datasets show BREAK achieving state-of-the-art performance and strong generalization to evidence-enabled settings, outperforming baselines by several percentage points in F1. This approach offers a scalable, content-only framework that effectively captures long-range semantic interrelations for fake news detection with practical resilience to noise and varying article lengths.

Abstract

The rapid proliferation of fake news on social media threatens social stability, creating an urgent demand for more effective detection methods. While many promising approaches have emerged, most rely on content analysis with limited semantic depth, leading to suboptimal comprehension of news content.To address this limitation, capturing broader-range semantics is essential yet challenging, as it introduces two primary types of noise: fully connecting sentences in news graphs often adds unnecessary structural noise, while highly similar but authenticity-irrelevant sentences introduce feature noise, complicating the detection process. To tackle these issues, we propose BREAK, a broad-range semantics model for fake news detection that leverages a fully connected graph to capture comprehensive semantics while employing dual denoising modules to minimize both structural and feature noise. The semantic structure denoising module balances the graph's connectivity by iteratively refining it between two bounds: a sequence-based structure as a lower bound and a fully connected graph as the upper bound. This refinement uncovers label-relevant semantic interrelations structures. Meanwhile, the semantic feature denoising module reduces noise from similar semantics by diversifying representations, aligning distinct outputs from the denoised graph and sequence encoders using KL-divergence to achieve feature diversification in high-dimensional space. The two modules are jointly optimized in a bi-level framework, enhancing the integration of denoised semantics into a comprehensive representation for detection. Extensive experiments across four datasets demonstrate that BREAK significantly outperforms existing fake news detection methods.

Paper Structure

This paper contains 27 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of traditional and denoised sentence-level graph representations. (a) A news example with both textual and visual information. (b) A traditional sentence-level graph that introduces structural noise (irrelevant connections) and overlooks the semantics of individual sentences. (c) A heatmap showing excessive similarity hindering key sentence distinction. (d) A denoised sentence-level graph (our approach), minimizing irrelevant connections and enhancing node diversity for better key sentence identification.
  • Figure 2: Overview of BREAK with Edge Weight Inference Example. The left part illustrates the overall process of BREAK, where $X_{seq}$ and $E_{seq}$ denote the sequence features that used as the lower bound of semantics, $R_{init}$ indicates the reference semantics that integrate structural and sequential semantics. $X_{node}$ and $F_{affi}$ separately denote the node features and affinity matrix, and $E_{str}$ represents the structural features of the denoised graph. The right part depicts an example of edge weight inference with three nodes.
  • Figure 3: Ablation study on four datasets.
  • Figure 4: Hyperparameters sensitivity with regard to $\beta$.
  • Figure 5: A case study of structure denoising. (a) represents the visualization of sentences' weights. (b) represents the weight of each edge between sentences. (c) depicts the normalized in- and out-degrees of each node.
  • ...and 1 more figures