Sycophancy Claims about Language Models: The Missing Human-in-the-Loop
Jan Batzner, Volker Stocker, Stefan Schmid, Gjergji Kasneci
TL;DR
The paper interrogates claims of sycophancy in large language models and critiques how current studies measure it, noting a lack of direct human-perception evaluation. It catalogs five automated measurement strategies (persona-based prompts, direct questioning, keyword misdirection, visual misdirection, and LLM-based judgments) and shows how methodological heterogeneity undermines comparability. It argues that sycophancy is inherently human-centric and that existing work often conflates it with related concepts such as personalization or agreeableness bias, limiting validity. The authors call for human-in-the-loop assessment, clearer terminology, and methods that explicitly measure human perceptions to establish robust, comparable metrics of AI sycophancy. This has implications for how we study alignment-driven behaviors in language models and for designing evaluation benchmarks that reflect real user experience.
Abstract
Sycophantic response patterns in Large Language Models (LLMs) have been increasingly claimed in the literature. We review methodological challenges in measuring LLM sycophancy and identify five core operationalizations. Despite sycophancy being inherently human-centric, current research does not evaluate human perception. Our analysis highlights the difficulties in distinguishing sycophantic responses from related concepts in AI alignment and offers actionable recommendations for future research.
