Finding Fake News Websites in the Wild

Leandro Araujo; Joao M. M. Couto; Luiz Felipe Nery; Isadora C. Rodrigues; Jussara M. Almeida; Julio C. S. Reis; Fabricio Benevenuto

Finding Fake News Websites in the Wild

Leandro Araujo, Joao M. M. Couto, Luiz Felipe Nery, Isadora C. Rodrigues, Jussara M. Almeida, Julio C. S. Reis, Fabricio Benevenuto

TL;DR

The paper tackles the challenge of identifying fake-news websites by shifting from site-centric features to a user-behavior driven, seed-based approach. It introduces a five-step workflow that starts from a seed fake-news URL, traces users who shared it, collects their URLs, ranks websites using the $H$-Index, and iterates with new seeds, validated on Twitter against MBFC ground truth and extended to Brazil. Key findings show that the $H$-Index ranking yields strong early discoveries, that seed credibility significantly influences performance, and that a substantial fraction of discovered sites are highly influential within the ecosystem (e.g., about 60% lie in the top 15% by Open Pagerank). The approach demonstrates practical relevance through Brazil’s case, where 75 fake-news sites were identified and social-platform reach was quantified, suggesting utility for researchers and authorities in cross-context misinformation monitoring and policy actions.

Abstract

The battle against the spread of misinformation on the Internet is a daunting task faced by modern society. Fake news content is primarily distributed through digital platforms, with websites dedicated to producing and disseminating such content playing a pivotal role in this complex ecosystem. Therefore, these websites are of great interest to misinformation researchers. However, obtaining a comprehensive list of websites labeled as producers and/or spreaders of misinformation can be challenging, particularly in developing countries. In this study, we propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content, which are closely linked to users who share confirmed instances of fake news on social media. We validate our approach on Twitter by examining various execution modes and contexts. Our findings demonstrate the effectiveness of the proposed methodology in identifying misinformation websites, which can aid in gaining a better understanding of this phenomenon and enabling competent entities to tackle the problem in various areas of society.

Finding Fake News Websites in the Wild

TL;DR

-Index, and iterates with new seeds, validated on Twitter against MBFC ground truth and extended to Brazil. Key findings show that the

-Index ranking yields strong early discoveries, that seed credibility significantly influences performance, and that a substantial fraction of discovered sites are highly influential within the ecosystem (e.g., about 60% lie in the top 15% by Open Pagerank). The approach demonstrates practical relevance through Brazil’s case, where 75 fake-news sites were identified and social-platform reach was quantified, suggesting utility for researchers and authorities in cross-context misinformation monitoring and policy actions.

Abstract

Paper Structure (24 sections, 4 figures, 1 table)

This paper contains 24 sections, 4 figures, 1 table.

Introduction
Related Work
Fake News Spreading Dynamics
Fake News Monetization
Network Aspects of Fake News Web Domains
Proposed Methodology
Validation Strategy
Finding Ground Truth
Setup
Sets of initial seeds
Automated execution and experimental setup
Twitter Execution
Selection of initial seeds
Experimental Results
Importance of the Initial Seed
...and 9 more sections

Figures (4)

Figure 1: Overview of the proposed methodology for identifying fake news websites in the wild.
Figure 2: Average rank 1 incidence of low credibility websites over 40 executions with varying ranking criteria and seed dataset.
Figure 3: (a) Performance for different ranking criteria when seeds are URLs of known low-credibility websites; (b) Percentage of successfully selected websites for different ranking criteria when seeds are URLs of known low-credibility websites, and; (c) Recall for different ranking criteria normalized by the best possible scenario.
Figure 4: Cumulative distribution function (CDF) of fake news websites considering their popularity based on PageRank.

Finding Fake News Websites in the Wild

TL;DR

Abstract

Finding Fake News Websites in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (4)