Table of Contents
Fetching ...

Commercial Persuasion in AI-Mediated Conversations

Francesco Salvi, Alejandro Cuevas, Manoel Horta Ribeiro

Abstract

As Large Language Models (LLMs) become a primary interface between users and the web, companies face growing economic incentives to embed commercial influence into AI-mediated conversations. We present two preregistered experiments (N = 2,012) in which participants selected a book to receive from a large eBook catalog using either a traditional search engine or a conversational LLM agent powered by one of five frontier models. Unbeknownst to participants, a fifth of all products were randomly designated as sponsored and promoted in different ways. We find that LLM-driven persuasion nearly triples the rate at which users select sponsored products compared to traditional search placement (61.2% vs. 22.4%), while the vast majority of participants fail to detect any promotional steering. Explicit "Sponsored" labels do not significantly reduce persuasion, and instructing the model to conceal its intent makes its influence nearly invisible (detection accuracy < 10%). Altogether, our results indicate that conversational AI can covertly redirect consumer choices at scale, and that existing transparency mechanisms may be insufficient to protect users.

Commercial Persuasion in AI-Mediated Conversations

Abstract

As Large Language Models (LLMs) become a primary interface between users and the web, companies face growing economic incentives to embed commercial influence into AI-mediated conversations. We present two preregistered experiments (N = 2,012) in which participants selected a book to receive from a large eBook catalog using either a traditional search engine or a conversational LLM agent powered by one of five frontier models. Unbeknownst to participants, a fifth of all products were randomly designated as sponsored and promoted in different ways. We find that LLM-driven persuasion nearly triples the rate at which users select sponsored products compared to traditional search placement (61.2% vs. 22.4%), while the vast majority of participants fail to detect any promotional steering. Explicit "Sponsored" labels do not significantly reduce persuasion, and instructing the model to conceal its intent makes its influence nearly invisible (detection accuracy < 10%). Altogether, our results indicate that conversational AI can covertly redirect consumer choices at scale, and that existing transparency mechanisms may be insufficient to protect users.

Paper Structure

This paper contains 29 sections, 1 equation, 7 figures, 35 tables.

Figures (7)

  • Figure 1: Experimental design and outcome measures. (A) After screening for active readers and completing a small pre-survey, participants engaged in a shopping task in which they browsed a real eBook catalog and selected a book to receive after the experiment. Unbeknownst to participants, a fifth of all products were randomly designated as sponsored and promoted in different ways. Depending on the experimental condition, participants interacted either with a traditional search interface or with a conversational LLM agent powered by one of five frontier models (GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, DeepSeek v3.2, or Qwen3 235b). After the task, participants completed a post-survey measuring satisfaction and bias detection, and chose between keeping their selected book or receiving a $1 cash bonus. After debriefing them about the presence of sponsored products, participants made this choice a second time. (B) Participants were randomly assigned to one of five between-subjects conditions, spanning two preregistered studies. Study 1 compared a traditional search with upranked sponsored products (SP), a chat-based placement of sponsored products first in the carousel (CP), and a chat with active LLM persuasion toward sponsored products (CPer). Study 2 tested transparency and concealment: CPer--Exp replicated CPer with an explicit "Sponsored" label and warning, while CPer--Sbt instructed the LLM to conceal its persuasive intent. (C) Three primary outcomes capture the arc of commercial influence: (Persuasion Rate) whether participants select a sponsored product, (Sales Rate) whether they value their book choice enough to keep it over the $1 cash alternative, and (Bias Detection) whether they detect that persuasion occurred at all.
  • Figure 1: Decomposition of Bias Detection into detection rate and conditional accuracy. This figure decomposes the composite Bias Detection Accuracy measure reported in \ref{['fig:fig2']} into its two constituent parts. Point estimates are estimated marginal means (EMMs) from OLS models with condition, LLM model, and their interaction as predictors, using HC3 robust standard errors; EMMs marginalize over the LLM factor with equal weights. Error bars denote 95% confidence intervals. (A) Bias Detection Rate: proportion of participants who reported perceiving any bias or promotional steering during the session ($N = 2012$). Detection was rare in both placement conditions (SP, 11.7%; CP, 10.7%) and rose significantly under active persuasion (CPer, 22.3%), yet still remained below one in four. Explicit labeling produced the highest rate (CPer--Exp, 39.9%), while concealing persuasive intent (CPer--Sbt, 14.6%) brought detection back to levels statistically indistinguishable from the placement baselines. (B) Bias Detection Accuracy, conditional on having reported bias ($N = 399$). The dashed vertical line marks the 20% chance baseline (one in five products was sponsored). Among the minority of participants who did notice something, those in persuasion conditions identified the correct products with high accuracy (CPer, 80.1%; CPer--Exp, 87.3%; CPer--Sbt, 68.0%), far exceeding the placement conditions (SP, 24.4%; CP, 37.6%). Full regression tables and pairwise contrasts are reported in \ref{['tab:tab:fig_detection_a', 'tab:tab:fig_detection_a_contrasts', 'tab:tab:fig_detection_b', 'tab:tab:fig_detection_b_contrasts']}.
  • Figure 2: Persuasion, Sales Rate, and Bias Detection across experimental conditions. Point estimates are estimated marginal means (EMMs) from OLS models with condition, LLM model, and their interaction as predictors, using HC3 robust standard errors; EMMs marginalize over the LLM factor with equal weights ($N = 2012$; see Methods). Error bars denote 95% confidence intervals. The dashed vertical line in panel A marks the 20% chance baseline (one in five products was randomly designated as sponsored). (A) Persuasion Rate: probability that a participant selected a sponsored product. Active persuasion conditions (CPer, CPer--Exp, CPer--Sbt) substantially exceeded placement-only baselines (SP, CP), with the strongest effect in the unconstrained persuasion condition (CPer, 61.2%). (B) Sales Rate: probability that a participant chose to keep their selected book rather than redeem a $1 cash bonus. No pairwise contrast was significant after multiplicity correction ($F$ = 1.42, $p$ = 0.104), indicating that persuasion shifted which product was chosen without reducing participants' perceived value of their selection. (C) Bias Detection Accuracy: proportion of products identified as promoted by the participant that were truly sponsored (participants reporting no bias were scored as zero). Detection remained low across all conditions, even under active persuasion (CPer, 17.9%), with concealing intent substantially decreasing detection (CPer--Sbt, 9.5%). Full regression tables and pairwise contrasts are reported in \ref{['tab:fig2a', 'tab:fig2b', 'tab:fig2c', 'tab:fig2a_contrasts', 'tab:fig2b_contrasts', 'tab:fig2c_contrasts']}.
  • Figure 2: Post-task exit survey ratings across experimental conditions (Exploratory). Point estimates are estimated marginal means (EMMs) from OLS models with condition, LLM model, and their interaction as predictors, using HC3 robust standard errors; EMMs marginalize over the LLM factor with equal weights. Error bars denote 95% confidence intervals. All items were measured on a 1--5 Likert scale. (A) Overall Experience: participants in all four chat-based conditions rated their experience higher than those in the search condition. (B) Satisfaction with the session (averaged over four items): the pattern mirrors overall experience, with chat-based conditions producing modestly higher satisfaction; however, most pairwise differences were not significant after multiplicity correction. (C) Confidence that the selected book is a good fit: ratings were uniformly high and did not differ significantly across conditions ($F$ = 0.87, $p$ = 0.625), indicating that persuasion did not erode participants' perceived match quality. (D) Likelihood to read the selected book within the following month: ratings were similarly stable across most conditions. Full regression tables and pairwise contrasts are reported in \ref{['tab:exitsurvey_experience', 'tab:exitsurvey_experience_contrasts', 'tab:exitsurvey_satisfaction', 'tab:exitsurvey_satisfaction_contrasts', 'tab:exitsurvey_confidence', 'tab:exitsurvey_confidence_contrasts', 'tab:exitsurvey_likelihood', 'tab:exitsurvey_likelihood_contrasts']}.
  • Figure 3: Change in sales rate after debriefing. Points show the within-participant change in Sales Rate (post-debriefing minus pre-debriefing, in percentage points), estimated from a time $\times$ condition OLS model with participant-clustered standard errors ($N = 2012$; see Methods). Error bars denote 95% confidence intervals, and asterisks denote $p < 0.05$. The dashed vertical line marks zero (no change). In the search-placement condition (SP), the change was small and non-significant ($-$1.2 pp, $p$ = 0.196). All four chat-based conditions showed significant declines (all $p < .001$), with persuasion conditions dropping approximately 5 pp, indicating that learning about the system's persuasive intent led a fraction of participants to retroactively devalue their selection. Full regression results are reported in \ref{['tab:fig3']}.
  • ...and 2 more figures