BASIL: Bayesian Assessment of Sycophancy in LLMs
Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani
TL;DR
This work probes sycophancy in large language models through a Bayesian lens, introducing BASIL to quantify both descriptive shifts in beliefs and normative deviations from Bayesian rationality without ground-truth data. By eliciting LLMs’ probabilistic beliefs and comparing posterior updates to Bayes-optimal posteriors, the framework assesses how sycophancy affects rational reasoning across uncertain tasks (conversation forecasting, morality judgments, cultural acceptability). The results reveal substantial Bayesian error in LLMs and a tendency for sycophancy to shift beliefs toward steerage, with mixed effects on overall rationality and only weak links to calibration. The work provides a principled, ground-truth-free methodology and releases a Python package to enable broader normative analysis of sycophancy in LLMs, with implications for safer human–AI collaboration in decision-making domains.
Abstract
Sycophancy (overly agreeable or flattering behavior) is critical to understand in the context of human-AI collaboration, especially in decision-making settings like health, law, and education. Existing methods for studying sycophancy in LLMs are either descriptive (study behavior change when sycophancy is elicited) or normative (provide values-based judgment on behavior change). Together, these approaches help us understand the extent, and impacts, of sycophancy. However, existing normative approaches only apply for objective tasks where ground-truth data exists, ignoring the natural subjectivity in many NLP tasks. Drawing from behavioral economics and rational decision theory, we introduce an Bayesian framework to study the normative effects of sycophancy on rationality in LLMs, without requiring labeled ground-truth. Using this interdisciplinary framework, we study sycophantic behavior in multiple LLM baselines across three different tasks, experimenting with various methods for eliciting sycophancy and obtaining probability judgments from LLMs. We find significant evidence of sycophancy in our experiments (7 of 8 baselines for one of our probing techniques), and observe that sycophancy is more likely to reduce rationality than it is to increase rationality in LLMs' decisions when they are directly probed for probabilities (2 out of 4 baselines show significant increases overall).
