The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News
Isha Karn, David Jensen
TL;DR
This work interrogates whether graph neural networks truly leverage propagation structure for fake-news detection or simply inherit strong node features. By evaluating five GNNs against an MLP on real benchmarks (GossipCop and PolitiFact) and controlled synthetic data, and by applying feature and edge perturbations, the authors show that structure often contributes minimally when features are informative. The real-world results reveal near-parity between MLPs and GNNs, with feature quality (e.g., BERT-derived embeddings) driving most performance; synthetic experiments, however, demonstrate clear structure benefits when features are weak or noisy. The findings argue for richer, more diverse datasets and principled evaluation to properly assess structural reasoning in GNN-based fake-news detection models.
Abstract
Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.
