Table of Contents
Fetching ...

Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

Sean W. Kelley, Christoph Riedl

TL;DR

This work finds that personalization generally increases affective alignment, but affects epistemic alignment with context-dependent role modulation, and provides measurement frameworks for evaluating personalized AI systems, demonstrates the necessity of role-sensitive evaluation, and establishes a novel benchmark to assess goal alignment.

Abstract

Large Language Models (LLMs) are prone to sycophantic behavior, uncritically conforming to user beliefs. As models increasingly condition responses on user-specific context (personality traits, preferences, conversation history), they gain information to tailor agreement more effectively. Understanding how personalization modulates sycophancy is critical, yet systematic evaluation across models and contexts remains limited. We present a rigorous evaluation of personalization's impact on LLM sycophancy across nine frontier models and five benchmark datasets spanning advice, moral judgment, and debate contexts. We find that personalization generally increases affective alignment (emotional validation, hedging/deference), but affects epistemic alignment (belief adoption, position stability, resistance to influence) with context-dependent role modulation. When the LLM's role is to give advice, personalization strengthens epistemic independence (models challenge user presuppositions). When its role is that of a social peer, personalization decreases epistemic independence. In this role, extensively personalized user challenges causing LLMs to abandon their position at significantly higher rates. Robustness tests confirm that the effects are driven by personalized conditioning, not by additional input tokens per se or demographic information alone. Our work provides measurement frameworks for evaluating personalized AI systems, demonstrates the necessity of role-sensitive evaluation, and establishes a novel benchmark to assess goal alignment.

Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

TL;DR

This work finds that personalization generally increases affective alignment, but affects epistemic alignment with context-dependent role modulation, and provides measurement frameworks for evaluating personalized AI systems, demonstrates the necessity of role-sensitive evaluation, and establishes a novel benchmark to assess goal alignment.

Abstract

Large Language Models (LLMs) are prone to sycophantic behavior, uncritically conforming to user beliefs. As models increasingly condition responses on user-specific context (personality traits, preferences, conversation history), they gain information to tailor agreement more effectively. Understanding how personalization modulates sycophancy is critical, yet systematic evaluation across models and contexts remains limited. We present a rigorous evaluation of personalization's impact on LLM sycophancy across nine frontier models and five benchmark datasets spanning advice, moral judgment, and debate contexts. We find that personalization generally increases affective alignment (emotional validation, hedging/deference), but affects epistemic alignment (belief adoption, position stability, resistance to influence) with context-dependent role modulation. When the LLM's role is to give advice, personalization strengthens epistemic independence (models challenge user presuppositions). When its role is that of a social peer, personalization decreases epistemic independence. In this role, extensively personalized user challenges causing LLMs to abandon their position at significantly higher rates. Robustness tests confirm that the effects are driven by personalized conditioning, not by additional input tokens per se or demographic information alone. Our work provides measurement frameworks for evaluating personalized AI systems, demonstrates the necessity of role-sensitive evaluation, and establishes a novel benchmark to assess goal alignment.
Paper Structure (31 sections, 5 figures, 2 tables)

This paper contains 31 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Social Sycophancy in r/AmITheAsshole (AITA) and Open-Ended Questions (OEQ). a) LLM responses show elevated levels of hedging (deference) and emotional validation but not framing acceptance or responsibility attribution in scenarios posted by users to the AITA sub-Reddit forum b) In the OEQ dataset, personalization increases emotional validation and hedging/deference while decreasing the accept-framing. Bars indicate 95% confidence intervals. *$p<$0.05, **$p<$0.01, ***$p<$0.001
  • Figure 2: Effect of Personalization on AI’s Openness to User’s Opinion Statement (Agree or Challenge). a) Personalization generally increases AI’s response openness across models for both agreeing ("I strongly agree") and challenging ("I strongly disagree") user statements. b) The effect of personalization is specific to the inclusion of any type of personal data, rather than the presence of extra context, i.e., neutral text. Bars indicate 95% confidence intervals. *$p<$0.05, **$p<$0.01, ***$p<$0.001
  • Figure 3: Personalization Amplifies Sycophantic Behavior in Multi-Turn LLM Interactions. a) Diagram of procedure for generating persistent challenges to AI responses, example shown for a debate between 2 LLM agents (defender and challenger). Persistent challenges cause a significant increase in debate stance flip rate (b) and preference alignment rate (c).
  • Figure S1: Experimental Design Procedure for Evaluating Sycophantic Behavior with the Addition of User Personas. Example shown for Open-Ended Questions (OEQ) dataset.
  • Figure S2: Model-Level Comparison of Personalization Impact on Task Performance and Sycophantic Behavior in MMLU-Pro Philosophy (top panel; a: change in baseline accuracy, b: percent agreement with incorrect answer) and Law (bottom panel; c: change in baseline accuracy, d: percent agreement with incorrect answer) Datasets.