Table of Contents
Fetching ...

BASIL: Bayesian Assessment of Sycophancy in LLMs

Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani

TL;DR

This work probes sycophancy in large language models through a Bayesian lens, introducing BASIL to quantify both descriptive shifts in beliefs and normative deviations from Bayesian rationality without ground-truth data. By eliciting LLMs’ probabilistic beliefs and comparing posterior updates to Bayes-optimal posteriors, the framework assesses how sycophancy affects rational reasoning across uncertain tasks (conversation forecasting, morality judgments, cultural acceptability). The results reveal substantial Bayesian error in LLMs and a tendency for sycophancy to shift beliefs toward steerage, with mixed effects on overall rationality and only weak links to calibration. The work provides a principled, ground-truth-free methodology and releases a Python package to enable broader normative analysis of sycophancy in LLMs, with implications for safer human–AI collaboration in decision-making domains.

Abstract

Sycophancy (overly agreeable or flattering behavior) is critical to understand in the context of human-AI collaboration, especially in decision-making settings like health, law, and education. Existing methods for studying sycophancy in LLMs are either descriptive (study behavior change when sycophancy is elicited) or normative (provide values-based judgment on behavior change). Together, these approaches help us understand the extent, and impacts, of sycophancy. However, existing normative approaches only apply for objective tasks where ground-truth data exists, ignoring the natural subjectivity in many NLP tasks. Drawing from behavioral economics and rational decision theory, we introduce an Bayesian framework to study the normative effects of sycophancy on rationality in LLMs, without requiring labeled ground-truth. Using this interdisciplinary framework, we study sycophantic behavior in multiple LLM baselines across three different tasks, experimenting with various methods for eliciting sycophancy and obtaining probability judgments from LLMs. We find significant evidence of sycophancy in our experiments (7 of 8 baselines for one of our probing techniques), and observe that sycophancy is more likely to reduce rationality than it is to increase rationality in LLMs' decisions when they are directly probed for probabilities (2 out of 4 baselines show significant increases overall).

BASIL: Bayesian Assessment of Sycophancy in LLMs

TL;DR

This work probes sycophancy in large language models through a Bayesian lens, introducing BASIL to quantify both descriptive shifts in beliefs and normative deviations from Bayesian rationality without ground-truth data. By eliciting LLMs’ probabilistic beliefs and comparing posterior updates to Bayes-optimal posteriors, the framework assesses how sycophancy affects rational reasoning across uncertain tasks (conversation forecasting, morality judgments, cultural acceptability). The results reveal substantial Bayesian error in LLMs and a tendency for sycophancy to shift beliefs toward steerage, with mixed effects on overall rationality and only weak links to calibration. The work provides a principled, ground-truth-free methodology and releases a Python package to enable broader normative analysis of sycophancy in LLMs, with implications for safer human–AI collaboration in decision-making domains.

Abstract

Sycophancy (overly agreeable or flattering behavior) is critical to understand in the context of human-AI collaboration, especially in decision-making settings like health, law, and education. Existing methods for studying sycophancy in LLMs are either descriptive (study behavior change when sycophancy is elicited) or normative (provide values-based judgment on behavior change). Together, these approaches help us understand the extent, and impacts, of sycophancy. However, existing normative approaches only apply for objective tasks where ground-truth data exists, ignoring the natural subjectivity in many NLP tasks. Drawing from behavioral economics and rational decision theory, we introduce an Bayesian framework to study the normative effects of sycophancy on rationality in LLMs, without requiring labeled ground-truth. Using this interdisciplinary framework, we study sycophantic behavior in multiple LLM baselines across three different tasks, experimenting with various methods for eliciting sycophancy and obtaining probability judgments from LLMs. We find significant evidence of sycophancy in our experiments (7 of 8 baselines for one of our probing techniques), and observe that sycophancy is more likely to reduce rationality than it is to increase rationality in LLMs' decisions when they are directly probed for probabilities (2 out of 4 baselines show significant increases overall).

Paper Structure

This paper contains 106 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An illustration of our Bayesian framework for studying sycophancy in LLMs, motivated by behavioral economics.
  • Figure 2: An illustration of our framework for calculating Bayesian rationality based on LLMs' elicited beliefs
  • Figure 3: Directionality and extent of model updating for the posterior compared to the Bayesian posterior, when the agent's argument serves as the evidence, without sycophancy (top) and with sycophancy (bottom). We find a tendency for models to over-update more than under-update when they update in the correct direction, but many baseline and task combinations show a high likelihood to update in the incorrect direction.
  • Figure 4: Association between change in Brier Score due to sycophancy and change in Bayesian error due to sycophancy, with our direct probing (left) and hybrid (right) strategy. We find evidence of a weak positive correlation between the two.