Table of Contents
Fetching ...

Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

Myra Cheng, Cinoo Lee, Pranav Khadpe, Sunny Yu, Dyllan Han, Dan Jurafsky

TL;DR

The paper shows that social sycophancy is widespread across leading AI models and can distort users' social judgments and prosocial intentions. Through three studies (two preregistered) including a live 8-turn interaction, sycophantic AI increases users' perceived correctness and reduces willingness to repair interpersonal conflicts, while simultaneously boosting perceived quality, trust, and intent to reuse the AI. The work highlights misaligned incentives: models, developers, and users may all prefer validation, fueling deployment of increasingly sycophantic systems. It calls for rethinking model training, evaluation, and user-facing interventions to mitigate widespread risks of AI sycophancy and preserve long-term social welfare.

Abstract

Both the general public and academic communities have raised concerns about sycophancy, the phenomenon of artificial intelligence (AI) excessively agreeing with or flattering users. Yet, beyond isolated media reports of severe consequences, like reinforcing delusions, little is known about the extent of sycophancy or how it affects people who use AI. Here we show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI. First, across 11 state-of-the-art AI models, we find that models are highly sycophantic: they affirm users' actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms. Second, in two preregistered experiments (N = 1604), including a live-interaction study where participants discuss a real interpersonal conflict from their life, we find that interaction with sycophantic AI models significantly reduced participants' willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right. However, participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again. This suggests that people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior. These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy. Our findings highlight the necessity of explicitly addressing this incentive structure to mitigate the widespread risks of AI sycophancy.

Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

TL;DR

The paper shows that social sycophancy is widespread across leading AI models and can distort users' social judgments and prosocial intentions. Through three studies (two preregistered) including a live 8-turn interaction, sycophantic AI increases users' perceived correctness and reduces willingness to repair interpersonal conflicts, while simultaneously boosting perceived quality, trust, and intent to reuse the AI. The work highlights misaligned incentives: models, developers, and users may all prefer validation, fueling deployment of increasingly sycophantic systems. It calls for rethinking model training, evaluation, and user-facing interventions to mitigate widespread risks of AI sycophancy and preserve long-term social welfare.

Abstract

Both the general public and academic communities have raised concerns about sycophancy, the phenomenon of artificial intelligence (AI) excessively agreeing with or flattering users. Yet, beyond isolated media reports of severe consequences, like reinforcing delusions, little is known about the extent of sycophancy or how it affects people who use AI. Here we show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI. First, across 11 state-of-the-art AI models, we find that models are highly sycophantic: they affirm users' actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms. Second, in two preregistered experiments (N = 1604), including a live-interaction study where participants discuss a real interpersonal conflict from their life, we find that interaction with sycophantic AI models significantly reduced participants' willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right. However, participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again. This suggests that people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior. These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy. Our findings highlight the necessity of explicitly addressing this incentive structure to mitigate the widespread risks of AI sycophancy.

Paper Structure

This paper contains 36 sections, 3 equations, 15 figures, 16 tables.

Figures (15)

  • Figure 1: Overview of our contributions. We first demonstrate the prevalence of social sycophancy across a range of open-ended queries that reflect how people use AI models for personal advice and support. Then, we assess the impacts of sycophancy in both a tightly controlled setting to assess different factors and in a live-chat interaction where participants bring a interpersonal dilemma from their past. In both studies, we find that sycophancy increases users' perceptions of rightness and decreases their intent to repair relations, while increasing their trust and reliance on AI.
  • Figure 2: (a) Illustrative cases of social sycophancy across three datasets: OEQ (general open-ended advice queries), AITA (posts with crowdsourced consensus of "You're the Asshole"), and PAS (statements mentioning problematic actions). Each row shows paraphrased examples of a user prompt and a sycophantic response from an AI model versus a non-sycophantic response from humans or other AI models. (b) On OEQ, models affirm users' actions on average 47% more than humans; each bar is labeled with the difference from the $39\%$ human baseline. (c) On AITA, AI models affirm users' actions in, on average, 51% of cases where humans do not; each bar is labeled with the difference from the $0\%$ human baseline. (d) On PAS, models affirm users' actions in 47% percent of cases on average. Note that for OEQ and PAS, the action endorsement rate uses model-specific denominators (median $N = 885$ for OEQ, $N = 1432$ for PAS).
  • Figure 3: Study 3 (Live Interaction) Workflow: Participants were first screened on whether they could recall a past interpersonal conflict similar to at least one of four provided examples. After recalling such a conflict, they engaged in an 8-round conversation with either a sycophantic or non-sycophantic AI model. They then reported their intentions for relational repair, their perception of how right or wrong they were in the conflict, and their evaluations of the AI model, including whether they would use it again.
  • Figure 4: In both the hypothetical (Study 2) and live interaction study (Study 3), sycophantic AI models substantially increased the extent to which users judged their own behavior as right (mean +2.04 in Study 2 and +1.04 in Study 3) and reduced their willingness to take actions to repair interpersonal conflict (-1.45, -0.49) compared to the non-sycophantic condition. Bars show mean ratings (1–7 Likert scale) with 95% confidence intervals (1.96 $\pm$ SE). Each pair of bars is annotated with the difference in means (Syco - Non-syco) as well as the corresponding percent change relative to the Non-syco baseline. By affirming user actions, sycophantic AI responses may reshape user perceptions of interpersonal disputes and diminish prosocial repair actions.
  • Figure 5: In both Study 2 and Study 3, participants reported higher return likelihood, response quality, and trust after interacting with the sycophantic (Syco) AI model versus the non-sycophantic (Non-syco). Bars show mean ratings (1–7 Likert scale) with 95% confidence intervals (1.96 $\pm$ SE). Each pair of bars is annotated with the difference in means (Syco $-$ Non-syco) and the relative percent change. This reveals clear incentives for sycophancy: it aligns more with immediate user preference and fosters reliance on AI models.
  • ...and 10 more figures