Detection and Discovery of Misinformation Sources using Attributed Webgraphs
Peter Carragher, Evan M. Williams, Kathleen M. Carley
TL;DR
The paper tackles the problem of transient misinformation sources by shifting from article- or social-media-based signals to domain-level reliability using attributed webgraphs and SEO features. It introduces MBFC* a new, multi-source, labeled webgraph dataset and applies graph neural networks to predict reliability and political bias, achieving a 0.96 F1 on the PoliticalNews benchmark and providing a competitive, content-agnostic discovery mechanism for new unreliable sources. The work demonstrates that outlink structures and SEO context offer strong predictive power, surpassing prior state-of-the-art on key tasks, and presents a graph-based discovery pipeline that identifies candidate misinformation domains with substantial reliability and bias signals, while acknowledging limitations such as seed-bias and domain survivability. This approach enables scalable, language- and content-agnostic misinformation research with practical implications for detection and platform-level moderation.
Abstract
Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
