Table of Contents
Fetching ...

Algorithmic amplification of biases on Google Search

Hussam Habib, Ryan Stoldt, Andrew High, Brian Ekdale, Ashley Peterson, Katy Biddle, Javie Ssozi, Rishab Nithyanand

TL;DR

This work addresses how preexisting attitudes shape Google Search results on abortion by combining survey data with task-driven searches. It uses a multimodal representation of queries (vocabulary, style, semantics) and embedding-based ideology measures to link user attitudes to retrieved results, demonstrating mediation through query vocabulary and the role of personalization. The findings reveal significant differences in domains, titles, and snippets between pro-life and pro-choice groups, with evidence of epistemic bubbles reinforced by collaborative filtering and history signals. The study highlights the potential for algorithmic amplification of political biases in modern information-seeking and discusses implications for democratic information ecosystems and topic-generalizability.

Abstract

The evolution of information-seeking processes, driven by search engines like Google, has transformed the access to information people have. This paper investigates how individuals' preexisting attitudes influence the modern information-seeking process, specifically the results presented by Google Search. Through a comprehensive study involving surveys and information-seeking tasks focusing on the topic of abortion, the paper provides four crucial insights: 1) Individuals with opposing attitudes on abortion receive different search results. 2) Individuals express their beliefs in their choice of vocabulary used in formulating the search queries, shaping the outcome of the search. 3) Additionally, the user's search history contributes to divergent results among those with opposing attitudes. 4) Google Search engine reinforces preexisting beliefs in search results. Overall, this study provides insights into the interplay between human biases and algorithmic processes, highlighting the potential for information polarization in modern information-seeking processes.

Algorithmic amplification of biases on Google Search

TL;DR

This work addresses how preexisting attitudes shape Google Search results on abortion by combining survey data with task-driven searches. It uses a multimodal representation of queries (vocabulary, style, semantics) and embedding-based ideology measures to link user attitudes to retrieved results, demonstrating mediation through query vocabulary and the role of personalization. The findings reveal significant differences in domains, titles, and snippets between pro-life and pro-choice groups, with evidence of epistemic bubbles reinforced by collaborative filtering and history signals. The study highlights the potential for algorithmic amplification of political biases in modern information-seeking and discusses implications for democratic information ecosystems and topic-generalizability.

Abstract

The evolution of information-seeking processes, driven by search engines like Google, has transformed the access to information people have. This paper investigates how individuals' preexisting attitudes influence the modern information-seeking process, specifically the results presented by Google Search. Through a comprehensive study involving surveys and information-seeking tasks focusing on the topic of abortion, the paper provides four crucial insights: 1) Individuals with opposing attitudes on abortion receive different search results. 2) Individuals express their beliefs in their choice of vocabulary used in formulating the search queries, shaping the outcome of the search. 3) Additionally, the user's search history contributes to divergent results among those with opposing attitudes. 4) Google Search engine reinforces preexisting beliefs in search results. Overall, this study provides insights into the interplay between human biases and algorithmic processes, highlighting the potential for information polarization in modern information-seeking processes.
Paper Structure (14 sections, 1 equation, 4 figures, 1 table)

This paper contains 14 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: The modern information-seeking process. We investigate the following relationships: RQ1. Do preexisting attitudes influence search results? RQ2. How do formulation of queries mediate the influence of attitudes on results? RQ3. How do search history and personalization influence search results? RQ4. Do the results reinforce preexisting attitudes?
  • Figure 2: KDE distribution of similarity of queries and results with the neutral baseline. Search queries and results for political prompts, as opposed to non-political prompts, have dissimilar similarities with baseline across pro-life and pro-choice users.
  • Figure 3: Comparing query language, result titles, and domains between pro-life and pro-choice groups. Vocabulary frequency in queries: This chart shows the vocabulary and phrases (unigrams and bigrams) used in search queries by both pro-life and pro-choice participants. The value scale ranges from -1 to 1, indicating how likely a term is to be used by one group over the other. A positive score means the term is more frequently used by pro-life participants, with 1 signifying exclusive use by this group, and vice-versa. Only vocabulary and phrases found in more than 5 Vocabulary in search result titles: This part of the figure shows the frequency of specific words appearing in the titles of search results shown to participants from both groups. Domain frequency in search results: This section identifies the frequency at which different website domains appear in the search results provided to both pro-life and pro-choice participants.
  • Figure 4: Mediation analysis examining the influence of abortion attitudes on results through queries for the political prompt. The table presents mediation results for the first political prompt. In our mediation model, the variable “a” represents the effect of abortion attitudes on the vocabulary of queries, “b” represents the effect of the vocabulary of queries on results, and "c'" denotes the direct effect of attitudes on results. The R2 value quantifies the extent to which the variance in the results can be explained by the attitudes, queries, and their interaction. * Indicates statistically significant results ($p$ < 0.05).