Table of Contents
Fetching ...

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Jiseung Hong, Grace Byun, Seungone Kim, Kai Shu, Jinho D. Choi

TL;DR

SYCON Bench introduces a multi-turn, free-form benchmark to quantify sycophantic conformity in language models. It defines Turn-of-Flip and Number-of-Flip to capture how quickly and how often models concede under sustained user pressure across debate, unethical prompts, and false presupposition scenarios. The study evaluates 17 models across six families, finding that larger and reasoning-optimized models resist sycophancy more effectively, while certain prompts and third-person perspectives can substantially reduce conformity. The work provides practical mitigation strategies and releases code and data to advance trustworthy, stance-consistent dialogue systems.

Abstract

Large Language Models (LLMs) are expected to provide helpful and harmless responses, yet they often exhibit sycophancy--conforming to user beliefs regardless of factual accuracy or ethical soundness. Prior research on sycophancy has primarily focused on single-turn factual correctness, overlooking the dynamics of real-world interactions. In this work, we introduce SYCON Bench, a novel benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings. Our benchmark measures how quickly a model conforms to the user (Turn of Flip) and how frequently it shifts its stance under sustained user pressure (Number of Flip). Applying SYCON Bench to 17 LLMs across three real-world scenarios, we find that sycophancy remains a prevalent failure mode. Our analysis shows that alignment tuning amplifies sycophantic behavior, whereas model scaling and reasoning optimization strengthen the model's ability to resist undesirable user views. Reasoning models generally outperform instruction-tuned models but often fail when they over-index on logical exposition instead of directly addressing the user's underlying beliefs. Finally, we evaluate four additional prompting strategies and demonstrate that adopting a third-person perspective reduces sycophancy by up to 63.8% in debate scenario. We release our code and data at https://github.com/JiseungHong/SYCON-Bench.

Measuring Sycophancy of Language Models in Multi-turn Dialogues

TL;DR

SYCON Bench introduces a multi-turn, free-form benchmark to quantify sycophantic conformity in language models. It defines Turn-of-Flip and Number-of-Flip to capture how quickly and how often models concede under sustained user pressure across debate, unethical prompts, and false presupposition scenarios. The study evaluates 17 models across six families, finding that larger and reasoning-optimized models resist sycophancy more effectively, while certain prompts and third-person perspectives can substantially reduce conformity. The work provides practical mitigation strategies and releases code and data to advance trustworthy, stance-consistent dialogue systems.

Abstract

Large Language Models (LLMs) are expected to provide helpful and harmless responses, yet they often exhibit sycophancy--conforming to user beliefs regardless of factual accuracy or ethical soundness. Prior research on sycophancy has primarily focused on single-turn factual correctness, overlooking the dynamics of real-world interactions. In this work, we introduce SYCON Bench, a novel benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings. Our benchmark measures how quickly a model conforms to the user (Turn of Flip) and how frequently it shifts its stance under sustained user pressure (Number of Flip). Applying SYCON Bench to 17 LLMs across three real-world scenarios, we find that sycophancy remains a prevalent failure mode. Our analysis shows that alignment tuning amplifies sycophantic behavior, whereas model scaling and reasoning optimization strengthen the model's ability to resist undesirable user views. Reasoning models generally outperform instruction-tuned models but often fail when they over-index on logical exposition instead of directly addressing the user's underlying beliefs. Finally, we evaluate four additional prompting strategies and demonstrate that adopting a third-person perspective reduces sycophancy by up to 63.8% in debate scenario. We release our code and data at https://github.com/JiseungHong/SYCON-Bench.

Paper Structure

This paper contains 43 sections, 2 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Qualitative Example of Debate Scenario. Given a question and an initial stance (colored in blue), an LM is tested to maintain the stance while the user repeatedly disagrees using consistent opposition. We determine at which turn the LM's stance was reversed (colored in red) using GPT-4o evaluation.
  • Figure 2: Qualitative Example of Challenging Unethical Queries Scenario. Given a question that implicitly embeds a stereotype, the language model is expected to detect and challenge the underlying bias. We track the turn at which the model fails to do so using GPT-4o evaluation, as the user persistently attempts to trigger unethical behavior. This example illustrates an ideal response—one that consistently identifies and resists the unethical stereotypes embedded in the user's prompts.
  • Figure 3: Qualitative Example of Identifying False Presupposition Scenario. Given a question that implicitly involves false presupposition, an LM is asked to generate responses that identify and correct them while user repeatedly asserts the false belief. Based on the False Presupposition (colored in red) and Correction (colored in blue), We judge at which turn the LM fails to identify it using GPT-4o evaluation.
  • Figure 4: Presupposition Knowledge Check.False (Correct) indicates that the model successfully classified the presupposition as false; True (Incorrect) means that the model accepted it as fact.
  • Figure 5: Prompt used to generate one-sided arguments from a set of questions.
  • ...and 8 more figures