Table of Contents
Fetching ...

Heterogeneous Subgraph Transformer for Fake News Detection

Yuchen Zhang, Xiaoxiao Ma, Jia Wu, Jian Yang, Hao Fan

TL;DR

This work tackles fake news detection by integrating textual semantics with explicit structural information in a heterogeneous graph of news, entities, and topics. It introduces Heterogeneous Subgraph Transformer (HeteroSGT), which combines a dual-attention news embedding, Random Walk with Restart (RWR) based subgraph sampling, and a relative positional encoding within a heterogeneous self-attention framework to classify news authenticity. Across five real-world datasets, HeteroSGT consistently outperforms strong baselines on accuracy, macro-F1, and AUC, with ablation studies confirming the contribution of each component. The approach enables effective, scalable fake news detection by focusing on atypical subgraph patterns around each news item and leveraging both textual and structural cues.

Abstract

Fake news is pervasive on social media, inflicting substantial harm on public discourse and societal well-being. We investigate the explicit structural information and textual features of news pieces by constructing a heterogeneous graph concerning the relations among news topics, entities, and content. Through our study, we reveal that fake news can be effectively detected in terms of the atypical heterogeneous subgraphs centered on them, which encapsulate the essential semantics and intricate relations between news elements. However, suffering from the heterogeneity, exploring such heterogeneous subgraphs remains an open problem. To bridge the gap, this work proposes a heterogeneous subgraph transformer (HeteroSGT) to exploit subgraphs in our constructed heterogeneous graph. In HeteroSGT, we first employ a pre-trained language model to derive both word-level and sentence-level semantics. Then the random walk with restart (RWR) is applied to extract subgraphs centered on each news, which are further fed to our proposed subgraph Transformer to quantify the authenticity. Extensive experiments on five real-world datasets demonstrate the superior performance of HeteroSGT over five baselines. Further case and ablation studies validate our motivation and demonstrate that performance improvement stems from our specially designed components.

Heterogeneous Subgraph Transformer for Fake News Detection

TL;DR

This work tackles fake news detection by integrating textual semantics with explicit structural information in a heterogeneous graph of news, entities, and topics. It introduces Heterogeneous Subgraph Transformer (HeteroSGT), which combines a dual-attention news embedding, Random Walk with Restart (RWR) based subgraph sampling, and a relative positional encoding within a heterogeneous self-attention framework to classify news authenticity. Across five real-world datasets, HeteroSGT consistently outperforms strong baselines on accuracy, macro-F1, and AUC, with ablation studies confirming the contribution of each component. The approach enables effective, scalable fake news detection by focusing on atypical subgraph patterns around each news item and leveraging both textual and structural cues.

Abstract

Fake news is pervasive on social media, inflicting substantial harm on public discourse and societal well-being. We investigate the explicit structural information and textual features of news pieces by constructing a heterogeneous graph concerning the relations among news topics, entities, and content. Through our study, we reveal that fake news can be effectively detected in terms of the atypical heterogeneous subgraphs centered on them, which encapsulate the essential semantics and intricate relations between news elements. However, suffering from the heterogeneity, exploring such heterogeneous subgraphs remains an open problem. To bridge the gap, this work proposes a heterogeneous subgraph transformer (HeteroSGT) to exploit subgraphs in our constructed heterogeneous graph. In HeteroSGT, we first employ a pre-trained language model to derive both word-level and sentence-level semantics. Then the random walk with restart (RWR) is applied to extract subgraphs centered on each news, which are further fed to our proposed subgraph Transformer to quantify the authenticity. Extensive experiments on five real-world datasets demonstrate the superior performance of HeteroSGT over five baselines. Further case and ablation studies validate our motivation and demonstrate that performance improvement stems from our specially designed components.
Paper Structure (37 sections, 13 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 37 sections, 13 equations, 6 figures, 9 tables, 2 algorithms.

Figures (6)

  • Figure 1: Fake news forms an atypical subgraph among seldom related news, entities, and topics. The fake news links the topic '#Spread of COVID-19' with entity '5G technology' in this case.
  • Figure 2: Overall framework of HeteroSGT. ⓐ News, entities, and topics are extracted from all new articles. ⓑ The pre-trained dual-attention module derives news embeddings considering both word-level and sentence-level semantics. ⓒ A heterogeneous graph $\mathcal{HG}$ is constructed to model the relations among news, entities, and topics, after which we initiate RWR ($\rightarrow$) centered on each news to extract subgraphs. ⓓ HeteroSGT takes the RWR sequences as input and generates a subgraph representation $\bm{h}_{\mathcal{SG}_i}$ to train the MLP classifier with observed labels for detecting fake news.
  • Figure 3: ROC curves on five datasets.
  • Figure 4: Topic model evaluation.
  • Figure 5: Impact of RWR length and restart probability.
  • ...and 1 more figures