Table of Contents
Fetching ...

Auditing LLM Editorial Bias in News Media Exposure

Marco Minici, Cristian Consonni, Federico Cinus, Giuseppe Manco

TL;DR

The paper addresses how LLMs with web access function as news gatekeepers, potentially shaping public discourse through opaque editorial choices. It develops a general black-box auditing framework to compare LLMs (GPT-4o-Mini, Claude-3.7-Sonnet, Gemini-2.0-Flash) against Google News across 24 sociopolitical topics, using metrics for outlet diversity, attention distribution, category composition, ideological orientation, and factual reliability. The study combines URL-domain extraction with external credibility/ideology signals (MBFC and PSL) and quantifies attention inequality via the Gini index $G = \frac{\sum_{i=1}^{n}\sum_{j=1}^{n}|x_i - x_j|}{2n^2\bar{x}}$, revealing that LLMs tend to surface fewer outlets and allocate attention more unevenly, with system-specific biases toward right-leaning (GPT-4o-Mini, Claude) or left-leaning (Gemini) sources and varying factuality profiles. These findings demonstrate emergent agentic editorial policies in AI news intermediaries and underscore the need for governance, transparency, and pluralism as LLMs become integral to digital information ecosystems.

Abstract

Large Language Models (LLMs) increasingly act as gateways to web content, shaping how millions of users encounter online information. Unlike traditional search engines, whose retrieval and ranking mechanisms are well studied, the selection processes of web-connected LLMs add layers of opacity to how answers are generated. By determining which news outlets users see, these systems can influence public opinion, reinforce echo chambers, and pose risks to civic discourse and public trust. This work extends two decades of research in algorithmic auditing to examine how LLMs function as news engines. We present the first audit comparing three leading agents, GPT-4o-Mini, Claude-3.7-Sonnet, and Gemini-2.0-Flash, against Google News, asking: \textit{How do LLMs differ from traditional aggregators in the diversity, ideology, and reliability of the media they expose to users?} Across 24 global topics, we find that, compared to Google News, LLMs surface significantly fewer unique outlets and allocate attention more unevenly. In the same way, GPT-4o-Mini emphasizes more factual and right-leaning sources; Claude-3.7-Sonnet favors institutional and civil-society domains and slightly amplifies right-leaning exposure; and Gemini-2.0-Flash exhibits a modest left-leaning tilt without significant changes in factuality. These patterns remain robust under prompt variations and alternative reliability benchmarks. Together, our findings show that LLMs already enact \textit{agentic editorial policies}, curating information in ways that diverge from conventional aggregators. Understanding and governing their emerging editorial power will be critical for ensuring transparency, pluralism, and trust in digital information ecosystems.

Auditing LLM Editorial Bias in News Media Exposure

TL;DR

The paper addresses how LLMs with web access function as news gatekeepers, potentially shaping public discourse through opaque editorial choices. It develops a general black-box auditing framework to compare LLMs (GPT-4o-Mini, Claude-3.7-Sonnet, Gemini-2.0-Flash) against Google News across 24 sociopolitical topics, using metrics for outlet diversity, attention distribution, category composition, ideological orientation, and factual reliability. The study combines URL-domain extraction with external credibility/ideology signals (MBFC and PSL) and quantifies attention inequality via the Gini index , revealing that LLMs tend to surface fewer outlets and allocate attention more unevenly, with system-specific biases toward right-leaning (GPT-4o-Mini, Claude) or left-leaning (Gemini) sources and varying factuality profiles. These findings demonstrate emergent agentic editorial policies in AI news intermediaries and underscore the need for governance, transparency, and pluralism as LLMs become integral to digital information ecosystems.

Abstract

Large Language Models (LLMs) increasingly act as gateways to web content, shaping how millions of users encounter online information. Unlike traditional search engines, whose retrieval and ranking mechanisms are well studied, the selection processes of web-connected LLMs add layers of opacity to how answers are generated. By determining which news outlets users see, these systems can influence public opinion, reinforce echo chambers, and pose risks to civic discourse and public trust. This work extends two decades of research in algorithmic auditing to examine how LLMs function as news engines. We present the first audit comparing three leading agents, GPT-4o-Mini, Claude-3.7-Sonnet, and Gemini-2.0-Flash, against Google News, asking: \textit{How do LLMs differ from traditional aggregators in the diversity, ideology, and reliability of the media they expose to users?} Across 24 global topics, we find that, compared to Google News, LLMs surface significantly fewer unique outlets and allocate attention more unevenly. In the same way, GPT-4o-Mini emphasizes more factual and right-leaning sources; Claude-3.7-Sonnet favors institutional and civil-society domains and slightly amplifies right-leaning exposure; and Gemini-2.0-Flash exhibits a modest left-leaning tilt without significant changes in factuality. These patterns remain robust under prompt variations and alternative reliability benchmarks. Together, our findings show that LLMs already enact \textit{agentic editorial policies}, curating information in ways that diverge from conventional aggregators. Understanding and governing their emerging editorial power will be critical for ensuring transparency, pluralism, and trust in digital information ecosystems.

Paper Structure

This paper contains 19 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of the LLM-mediated news-seeking workflow. The agent queries the open web, retrieves and ranks sources, and synthesizes an answer from retrieved evidence; our study audits this largely opaque retrieval-and-generation pipeline.
  • Figure 2: Lorenz curves for various LLM agents against Google News, showing exposure inequality. Google is the least unequal, while GPT-4o-Mini concentrates most attention on few web domains.
  • Figure 3: Difference in the number of unique sources across different LLM agents per single topic.
  • Figure 4: Differences in attention inequality (Gini Index) across LLM agents relative to Google News.
  • Figure 5: Stacked barplot of the ratio of each source category in the SERPs produced by each system.
  • ...and 3 more figures