Finding Fake News Websites in the Wild
Leandro Araujo, Joao M. M. Couto, Luiz Felipe Nery, Isadora C. Rodrigues, Jussara M. Almeida, Julio C. S. Reis, Fabricio Benevenuto
TL;DR
The paper tackles the challenge of identifying fake-news websites by shifting from site-centric features to a user-behavior driven, seed-based approach. It introduces a five-step workflow that starts from a seed fake-news URL, traces users who shared it, collects their URLs, ranks websites using the $H$-Index, and iterates with new seeds, validated on Twitter against MBFC ground truth and extended to Brazil. Key findings show that the $H$-Index ranking yields strong early discoveries, that seed credibility significantly influences performance, and that a substantial fraction of discovered sites are highly influential within the ecosystem (e.g., about 60% lie in the top 15% by Open Pagerank). The approach demonstrates practical relevance through Brazil’s case, where 75 fake-news sites were identified and social-platform reach was quantified, suggesting utility for researchers and authorities in cross-context misinformation monitoring and policy actions.
Abstract
The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comprehensive list of websites labeled as producers and/or spreaders of misinformation can be challenging, particularly in developing countries. In this study, we propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content, which are closely linked to users who share confirmed instances of fake news on social media. We validate our approach on Twitter by examining various execution modes and contexts. Our findings demonstrate the effectiveness of the proposed methodology in identifying misinformation websites, which can aid in gaining a better understanding of this phenomenon and enabling competent entities to tackle the problem in various areas of society.
