Table of Contents
Fetching ...

Answer Bubbles: Information Exposure in AI-Mediated Search

Michelle Huang, Agam Goyal, Koustuv Saha, Eshwar Chandrasekharan

Abstract

Generative search systems are increasingly replacing link-based retrieval with AI-generated summaries, yet little is known about how these systems differ in sources, language, and fidelity to cited material. We examine responses to 11,000 real search queries across four systems -- vanilla GPT, Search GPT, Google AI Overviews, and traditional Google Search -- at three levels: source diversity, linguistic characterization of the generated summary, and source-summary fidelity. We find that generative search systems exhibit significant \textit{source-selection} biases in their citations, favoring certain sources over others. Incorporating search also selectively attenuates epistemic markers, reducing hedging by up to 60\% while preserving confidence language in the AI-generated summaries. At the same time, AI summaries further compound the citation biases: Wikipedia and longer sources are disproportionately overrepresented, whereas cited social media content and negatively framed sources are substantially underrepresented. Our findings highlight the potential for \textit{answer bubbles}, in which identical queries yield structurally different information realities across systems, with implications for user trust, source visibility, and the transparency of AI-mediated information access.

Answer Bubbles: Information Exposure in AI-Mediated Search

Abstract

Generative search systems are increasingly replacing link-based retrieval with AI-generated summaries, yet little is known about how these systems differ in sources, language, and fidelity to cited material. We examine responses to 11,000 real search queries across four systems -- vanilla GPT, Search GPT, Google AI Overviews, and traditional Google Search -- at three levels: source diversity, linguistic characterization of the generated summary, and source-summary fidelity. We find that generative search systems exhibit significant \textit{source-selection} biases in their citations, favoring certain sources over others. Incorporating search also selectively attenuates epistemic markers, reducing hedging by up to 60\% while preserving confidence language in the AI-generated summaries. At the same time, AI summaries further compound the citation biases: Wikipedia and longer sources are disproportionately overrepresented, whereas cited social media content and negatively framed sources are substantially underrepresented. Our findings highlight the potential for \textit{answer bubbles}, in which identical queries yield structurally different information realities across systems, with implications for user trust, source visibility, and the transparency of AI-mediated information access.
Paper Structure (14 sections, 2 equations, 2 figures, 6 tables)

This paper contains 14 sections, 2 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Paper Overview. Traditional search returns a ranked list of links for users to evaluate, while generative search produces an answer bubble containing AI-generated summaries synthesized from multiple sources. We study these answer bubbles along three dimensions: the sources they cite (RQ1), the linguistic and epistemic qualities of their summaries (RQ2), and how faithfully those summaries represent cited content (RQ3).
  • Figure 2: Top-15 cited domains by query topic for each source (% of queries citing each domain). Cell values $\geq$1% are shown. Domain preferences are strongly topic-dependent: IMDB dominates entertainment, ESPN dominates sports, and Spotify/Genius dominate music, but only in Google's systems and not Search GPT.