Table of Contents
Fetching ...

Tracking the Takes and Trajectories of English-Language News Narratives across Trustworthy and Worrisome Websites

Hans W. A. Hanley, Emily Okabe, Zakir Durumeric

TL;DR

The paper presents a scalable system to map how English-language news narratives travel across trustworthy and worrisome websites by embedding passages with encoder-based LLMs, clustering story clusters with DP-Means, and inferring inter-site relationships via NETINF, augmented with zero-shot stance detection. It demonstrates that reliable outlets significantly influence the topics and narratives across the ecosystem, while unreliable sites contribute distinctive stances and can seed propaganda networks on topics like Ukraine and vaccines. The approach yields a near-global perspective on the English-language news landscape, enabling journalists and fact-checkers to prioritize narratives for verification and to identify influential sources and coordination networks. The authors also provide an open-source release of weights, code, and crawled URLs to support reproducibility and further research in misinformation and propaganda analytics.

Abstract

Understanding how misleading and outright false information enters news ecosystems remains a difficult challenge that requires tracking how narratives spread across thousands of fringe and mainstream news websites. To do this, we introduce a system that utilizes encoder-based large language models and zero-shot stance detection to scalably identify and track news narratives and their attitudes across over 4,000 factually unreliable, mixed-reliability, and factually reliable English-language news websites. Running our system over an 18 month period, we track the spread of 146K news stories. Using network-based interference via the NETINF algorithm, we show that the paths of news narratives and the stances of websites toward particular entities can be used to uncover slanted propaganda networks (e.g., anti-vaccine and anti-Ukraine) and to identify the most influential websites in spreading these attitudes in the broader news ecosystem. We hope that increased visibility into our distributed news ecosystem can help with the reporting and fact-checking of propaganda and disinformation.

Tracking the Takes and Trajectories of English-Language News Narratives across Trustworthy and Worrisome Websites

TL;DR

The paper presents a scalable system to map how English-language news narratives travel across trustworthy and worrisome websites by embedding passages with encoder-based LLMs, clustering story clusters with DP-Means, and inferring inter-site relationships via NETINF, augmented with zero-shot stance detection. It demonstrates that reliable outlets significantly influence the topics and narratives across the ecosystem, while unreliable sites contribute distinctive stances and can seed propaganda networks on topics like Ukraine and vaccines. The approach yields a near-global perspective on the English-language news landscape, enabling journalists and fact-checkers to prioritize narratives for verification and to identify influential sources and coordination networks. The authors also provide an open-source release of weights, code, and crawled URLs to support reproducibility and further research in misinformation and propaganda analytics.

Abstract

Understanding how misleading and outright false information enters news ecosystems remains a difficult challenge that requires tracking how narratives spread across thousands of fringe and mainstream news websites. To do this, we introduce a system that utilizes encoder-based large language models and zero-shot stance detection to scalably identify and track news narratives and their attitudes across over 4,000 factually unreliable, mixed-reliability, and factually reliable English-language news websites. Running our system over an 18 month period, we track the spread of 146K news stories. Using network-based interference via the NETINF algorithm, we show that the paths of news narratives and the stances of websites toward particular entities can be used to uncover slanted propaganda networks (e.g., anti-vaccine and anti-Ukraine) and to identify the most influential websites in spreading these attitudes in the broader news ecosystem. We hope that increased visibility into our distributed news ecosystem can help with the reporting and fact-checking of propaganda and disinformation.
Paper Structure (23 sections, 9 equations, 17 figures, 14 tables)

This paper contains 23 sections, 9 equations, 17 figures, 14 tables.

Figures (17)

  • Figure 1: Our pipeline for identifying, labeling, and extracting the stance of story clusters from the daily publications of news websites.
  • Figure 2: Estimated Partisanship via Bayesian regression of our websites based on their stances to articles' topics.
  • Figure 3: The most commonly discussed stories on reliable news websites labeled with their LLM-generated summaries.
  • Figure 4: Distribution of Ukraine and Vaccine bias across unreliable, mixed-reliability, and reliable news websites estimated by Bayesian regression models.
  • Figure 5: The percentage of each ecoystems copied stories that came from each different ecosystem as well as the change in the average time delay between website copy/reposting on the same narrative depending on the combination of news ecosystems.
  • ...and 12 more figures