Table of Contents
Fetching ...

The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News

Isha Karn, David Jensen

TL;DR

This work interrogates whether graph neural networks truly leverage propagation structure for fake-news detection or simply inherit strong node features. By evaluating five GNNs against an MLP on real benchmarks (GossipCop and PolitiFact) and controlled synthetic data, and by applying feature and edge perturbations, the authors show that structure often contributes minimally when features are informative. The real-world results reveal near-parity between MLPs and GNNs, with feature quality (e.g., BERT-derived embeddings) driving most performance; synthetic experiments, however, demonstrate clear structure benefits when features are weak or noisy. The findings argue for richer, more diverse datasets and principled evaluation to properly assess structural reasoning in GNN-based fake-news detection models.

Abstract

Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.

The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News

TL;DR

This work interrogates whether graph neural networks truly leverage propagation structure for fake-news detection or simply inherit strong node features. By evaluating five GNNs against an MLP on real benchmarks (GossipCop and PolitiFact) and controlled synthetic data, and by applying feature and edge perturbations, the authors show that structure often contributes minimally when features are informative. The real-world results reveal near-parity between MLPs and GNNs, with feature quality (e.g., BERT-derived embeddings) driving most performance; synthetic experiments, however, demonstrate clear structure benefits when features are weak or noisy. The findings argue for richer, more diverse datasets and principled evaluation to properly assess structural reasoning in GNN-based fake-news detection models.

Abstract

Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.

Paper Structure

This paper contains 25 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Graph Size Distribution for Fake and Real News in GossipCop (left) and PolitiFact (right)
  • Figure 2: t-SNE Projection of node features for GossipCop (left) and PolitiFact (Right)
  • Figure 3: Validation accuracy over training epochs for six models (GCN, GAT, GraphSAGE, GCNFN, GNNCL, and MLP) on GossipCop (top) and PolitiFact (bottom). Each curve corresponds to a different input setting: Base, Shuffled Features, and Shuffled Edges.
  • Figure 4: Test accuracy over training epochs for four models (GCN, GAT, GraphSAGE, MLP) across synthetic datasets with controlled feature and structural signals. The top row shows performance on clean separable gaussian features (left: unshuffled, right: node-randomized), where node features are highly informative. The bottom row compares performance on noisy features (left) and structure-only settings (right), where node features are misleading or absent. GNNs consistently outperform MLPs when features are weak or deceptive, while MLPs perform competitively when features are strong and class-separable.
  • Figure 5: Normalized root degree distributions for GossipCop (left) and PolitiFact (right). Values closer to 1 indicate ego-like star graphs where most nodes connect directly to the root post. GossipCop exhibits more extreme centralization across its examples.