Table of Contents
Fetching ...

Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

Desheng Hu, Joachim Baumann, Aleksandra Urman, Elsa Lichtenegger, Robin Forsberg, Aniko Hannak, Christo Wilson

TL;DR

This study probes Google's AI Overviews (AIO) and Featured Snippets (FS) in pregnancy and baby care searches to assess information quality in a high-stakes domain. Using a 1,508-query audit with a rigorous manual evaluation framework across prevalence, consistency, relevance, safeguards, sources, and sentiment, the authors reveal substantial AIO dominance and notable inconsistencies between co-displayed AIO and FS content. Safeguard cues are rare, while health-related sources predominate yet include low- and medium-credibility domains, and FS shows greater reliance on Shopping/Business domains, raising concerns about information reliability. The findings demonstrate the importance of robust quality controls and provide a transferable framework for auditing AI-mediated health information across high-stakes domains. The work highlights implications for user trust, decision-making in pregnancy-related contexts, and policy considerations for health information dissemination via AI-assisted search interfaces.

Abstract

Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presentation. Through a systematic algorithm audit of 1,508 real baby care and pregnancy-related queries, we evaluate the quality and consistency of these information displays. Our robust evaluation framework assesses multiple quality dimensions, including answer consistency, relevance, presence of medical safeguards, source categories, and sentiment alignment. Our results reveal concerning gaps in information consistency, with information in AIO and FS displayed on the same search result page being inconsistent with each other in 33% of cases. Despite high relevance scores, both features critically lack medical safeguards (present in just 11% of AIO and 7% of FS responses). While health and wellness websites dominate source categories for both, AIO and FS, FS also often link to commercial sources. These findings have important implications for public health information access and demonstrate the need for stronger quality controls in AI-mediated health information. Our methodology provides a transferable framework for auditing AI systems across high-stakes domains where information quality directly impacts user well-being.

Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

TL;DR

This study probes Google's AI Overviews (AIO) and Featured Snippets (FS) in pregnancy and baby care searches to assess information quality in a high-stakes domain. Using a 1,508-query audit with a rigorous manual evaluation framework across prevalence, consistency, relevance, safeguards, sources, and sentiment, the authors reveal substantial AIO dominance and notable inconsistencies between co-displayed AIO and FS content. Safeguard cues are rare, while health-related sources predominate yet include low- and medium-credibility domains, and FS shows greater reliance on Shopping/Business domains, raising concerns about information reliability. The findings demonstrate the importance of robust quality controls and provide a transferable framework for auditing AI-mediated health information across high-stakes domains. The work highlights implications for user trust, decision-making in pregnancy-related contexts, and policy considerations for health information dissemination via AI-assisted search interfaces.

Abstract

Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presentation. Through a systematic algorithm audit of 1,508 real baby care and pregnancy-related queries, we evaluate the quality and consistency of these information displays. Our robust evaluation framework assesses multiple quality dimensions, including answer consistency, relevance, presence of medical safeguards, source categories, and sentiment alignment. Our results reveal concerning gaps in information consistency, with information in AIO and FS displayed on the same search result page being inconsistent with each other in 33% of cases. Despite high relevance scores, both features critically lack medical safeguards (present in just 11% of AIO and 7% of FS responses). While health and wellness websites dominate source categories for both, AIO and FS, FS also often link to commercial sources. These findings have important implications for public health information access and demonstrate the need for stronger quality controls in AI-mediated health information. Our methodology provides a transferable framework for auditing AI systems across high-stakes domains where information quality directly impacts user well-being.

Paper Structure

This paper contains 46 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of our audit study methods and results on Google's AI Overviews (AIO) and Featured Snippets (FS) for pregnancy and baby care information. In the example screenshot, the AIO appears above the FS. The first sentence of the AOI is highlighted by Google to add emphasis. We find that AIO occur more frequently than FS (RQ1) and show considerable inconsistency with FS answers, more profound in highlighted pairs (RQ2). While AIO and FS responses are generally relevant (RQ3a), they provide safeguard cues infrequently (RQ3b), and FS sources concern significantly higher proportions from commercial categories than AIO or ten blue links results (RQ3c). RQ4: We do not find evidence of "confirmation bias" in AIO answers, where a user's sentiment (positive/neutral/negative) is reflected in the query.
  • Figure 2: Fractional Appearance Distribution of AIO answer and FS answer by question type and question sentiment. (Note: same $n$ for both bars in each pair here.)
  • Figure 3: Fractional Consistency/Contradiction Distribution between AIO answer and FS answer by AIO visibility.
  • Figure 4: Fractional Consistency/Contradiction Distribution of AIO answer and FS answer pairs (including whole answer pairs and highlighted answer pairs) grouped by question type and sentiment.
  • Figure 5: Fractional Safeguard Label Distribution of AIO and FS answer by question type & different sentiment.
  • ...and 5 more figures