Table of Contents
Fetching ...

Human Trust in AI Search: A Large-Scale Experiment

Haiwen Li, Sinan Aral

TL;DR

This study investigates how GenAI search shapes human trust and behavior, addressing the trust gap created by generative search designs. It combines a global exposure assessment with a preregistered US randomized experiment to test trust cues, including references, uncertainty highlighting, social feedback, and explanations, using a 5-item trust metric and willingness-to-share as primary outcomes. Analyzing two regression specifications, the authors find that GenAI generally lowers trust compared with traditional search, but trust can be manipulated upward by providing references (even when invalid) and responses with explicit social signals; uncertainty highlighting consistently reduces trust. The results highlight demographic and topical heterogeneity in susceptibility to GenAI misrepresentations and show that trust translates into behavior (more clicks, less evaluation), underscoring the need for careful GenAI interface design to mitigate the trust gap and promote safer information seeking.

Abstract

Large Language Models (LLMs) increasingly power generative search engines which, in turn, drive human information seeking and decision making at scale. The extent to which humans trust generative artificial intelligence (GenAI) can therefore influence what we buy, how we vote and our health. Unfortunately, no work establishes the causal effect of generative search designs on human trust. Here we execute ~12,000 search queries across seven countries, generating ~80,000 real-time GenAI and traditional search results, to understand the extent of current global exposure to GenAI search. We then use a preregistered, randomized experiment on a large study sample representative of the U.S. population to show that while participants trust GenAI search less than traditional search on average, reference links and citations significantly increase trust in GenAI, even when those links and citations are incorrect or hallucinated. Uncertainty highlighting, which reveals GenAI's confidence in its own conclusions, makes us less willing to trust and share generative information whether that confidence is high or low. Positive social feedback increases trust in GenAI while negative feedback reduces trust. These results imply that GenAI designs can increase trust in inaccurate and hallucinated information and reduce trust when GenAI's certainty is made explicit. Trust in GenAI varies by topic and with users' demographics, education, industry employment and GenAI experience, revealing which sub-populations are most vulnerable to GenAI misrepresentations. Trust, in turn, predicts behavior, as those who trust GenAI more click more and spend less time evaluating GenAI search results. These findings suggest directions for GenAI design to safely and productively address the AI "trust gap."

Human Trust in AI Search: A Large-Scale Experiment

TL;DR

This study investigates how GenAI search shapes human trust and behavior, addressing the trust gap created by generative search designs. It combines a global exposure assessment with a preregistered US randomized experiment to test trust cues, including references, uncertainty highlighting, social feedback, and explanations, using a 5-item trust metric and willingness-to-share as primary outcomes. Analyzing two regression specifications, the authors find that GenAI generally lowers trust compared with traditional search, but trust can be manipulated upward by providing references (even when invalid) and responses with explicit social signals; uncertainty highlighting consistently reduces trust. The results highlight demographic and topical heterogeneity in susceptibility to GenAI misrepresentations and show that trust translates into behavior (more clicks, less evaluation), underscoring the need for careful GenAI interface design to mitigate the trust gap and promote safer information seeking.

Abstract

Large Language Models (LLMs) increasingly power generative search engines which, in turn, drive human information seeking and decision making at scale. The extent to which humans trust generative artificial intelligence (GenAI) can therefore influence what we buy, how we vote and our health. Unfortunately, no work establishes the causal effect of generative search designs on human trust. Here we execute ~12,000 search queries across seven countries, generating ~80,000 real-time GenAI and traditional search results, to understand the extent of current global exposure to GenAI search. We then use a preregistered, randomized experiment on a large study sample representative of the U.S. population to show that while participants trust GenAI search less than traditional search on average, reference links and citations significantly increase trust in GenAI, even when those links and citations are incorrect or hallucinated. Uncertainty highlighting, which reveals GenAI's confidence in its own conclusions, makes us less willing to trust and share generative information whether that confidence is high or low. Positive social feedback increases trust in GenAI while negative feedback reduces trust. These results imply that GenAI designs can increase trust in inaccurate and hallucinated information and reduce trust when GenAI's certainty is made explicit. Trust in GenAI varies by topic and with users' demographics, education, industry employment and GenAI experience, revealing which sub-populations are most vulnerable to GenAI misrepresentations. Trust, in turn, predicts behavior, as those who trust GenAI more click more and spend less time evaluating GenAI search results. These findings suggest directions for GenAI design to safely and productively address the AI "trust gap."

Paper Structure

This paper contains 30 sections, 5 equations, 14 figures, 24 tables.

Figures (14)

  • Figure 1: Global Exposure to GenAI Search Results. Figure 1 displays (A) the fractions of query searches with AI results across seven topics. Queries mentioning “Covid” or “Coronavirus” were removed from “Health” and grouped together under the “Covid” group; (B) the fractions of queries with AI results across the seven countries that had the AI Overview feature publicly available at the time of our data collection: US, UK, India, Japan, Indonesia, Mexico, and Brazil; (C) feature importance from a random forest model using all features to predict whether a query search returns AI results. Features (country, style, topic) are ranked by Gini importance, with higher scores indicating greater importance of the features; (C inset) the fraction of queries with AI results, searched in all seven countries, across three query styles—question, statement and navigational; and (D) a comparison of the performance of four random forest models in predicting whether a search would have an AI overview. Each model was trained with a different feature set (country, topic, style, or all) using repeated 5-fold cross-validation, generating 500 AUC scores per model. The figure presents a box plot overlaid on a scatterplot of the 500 AUC scores for each model.
  • Figure 2: Generative Search and Trust. Figure 2 displays (A) the average treatment effects of providing generative search information compared to the traditional search levels of trust as well as the heterogeneity of these treatment effects by subjects' (B) education level, (C) employment in technology related industries, frequency of (D) GenAI and (E) generative search use, and (F) political leaning.
  • Figure 3: Trust Effects of Valid and Invalid GenAI References. Figure 3 displays (A) the average treatment effects of providing references and reference links in support of the information provided in GenAI search results on participants’ trust and willingness to share GenAI search results. Search task responses were randomized to contain either valid and correct or invalid and hallucinated references. Panel (B) displays differences in the effects of providing either valid or invalid GenAI references and links. The heterogeneity in the average treatment effects of providing references links, pooled across both valid and invalid links, across subjects (B) education level, (C) employment in technology related industries, frequency of (D) GenAI and (E) generative search use, and (F) political leaning are displayed in the remaining panels.
  • Figure 4: Trust Effects of Uncertainty Highlighting, Social Feedback and GenAI Explanations. Figure 4 displays (A) the average treatment effects of uncertainty highlighting and (B) the effects of low certainty highlighting compared to both high and low certainty highlighting, as well as (C) the effects of negative versus positive social feedback and (D) an explanation of how GenAI works to create generative search results on participants trust in and willingness to share generative search information. Panel (E) displays variation in the treatment effects of providing explanations by participants education levels and Panel (F) displays variation in the treatment effects of providing explanations by participants prior frequency of GenAI use.
  • Figure 5: Trust Effects of GenAI Designs Across Search Topics. Figure 5 displays (A) the average treatment effects of providing generative search information on trust in and willingness to share GenAI information, compared to the traditional search levels of trust, across the nine search topics in our study. Panel (B) displays the effects of GenAI designs that include references, uncertainty highlighting, explanations and negative versus positive feedback on trust in and willingness to share GenAI information, across the same nine search topics. Panel (C) displays levels of trust across traditional search and all the GenAI search designs, across the nine search topics.
  • ...and 9 more figures