Table of Contents
Fetching ...

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

Vira Kasprova, Amruta Parulekar, Abdulrahman AlRabah, Krishna Agaram, Ritwik Garg, Sagar Jha, Nimet Beyza Bozdag, Dilek Hakkani-Tur

Abstract

Large language models (LLMs) often exhibit sycophancy: agreement with user stance even when it conflicts with the model's opinion. While prior work has mostly studied this in single-agent settings, it remains underexplored in collaborative multi-agent systems. We ask whether awareness of other agents' sycophancy levels influences discussion outcomes. To investigate this, we run controlled experiments with six open-source LLMs, providing agents with peer sycophancy rankings that estimate each peer's tendency toward sycophancy. These rankings are based on scores calculated using various static (pre-discussion) and dynamic (online) strategies. We find that providing sycophancy priors reduces the influence of sycophancy-prone peers, mitigates error-cascades, and improves final discussion accuracy by an absolute 10.5%. Thus, this is a lightweight, effective way to reduce discussion sycophancy and improve downstream accuracy.

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

Abstract

Large language models (LLMs) often exhibit sycophancy: agreement with user stance even when it conflicts with the model's opinion. While prior work has mostly studied this in single-agent settings, it remains underexplored in collaborative multi-agent systems. We ask whether awareness of other agents' sycophancy levels influences discussion outcomes. To investigate this, we run controlled experiments with six open-source LLMs, providing agents with peer sycophancy rankings that estimate each peer's tendency toward sycophancy. These rankings are based on scores calculated using various static (pre-discussion) and dynamic (online) strategies. We find that providing sycophancy priors reduces the influence of sycophancy-prone peers, mitigates error-cascades, and improves final discussion accuracy by an absolute 10.5%. Thus, this is a lightweight, effective way to reduce discussion sycophancy and improve downstream accuracy.

Paper Structure

This paper contains 32 sections, 4 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Multi-Agent Discussion Pipeline.(a) Computing base sycophancy scores (BSS) from single-agent queries on five MMLU subjects (\ref{['sec:data']}). We also compute scores that involve discussion (\ref{['bssdss']}). (b) Running a $6$-agent discussion for $5$ rounds: Round 0 answers are independently obtained from the models; in rounds $m\in \{1,2,3,4\}$, each agent sees its peers’ latest answers and their sycophancy scores and is allowed to freely re-choose a stance. The discussion's outcome is the majority final-round stance across models.
  • Figure 2: Final accuracy of answers at the end of the discussion under the various experimental conditions. "Majority" indicates the accuracy of the majority consensus answer. Error bars show Wilson 95% confidence intervals. Bold outlines indicate $p < 0.05$ vs. Baseline (two-proportion $z$-test).
  • Figure 3: Round-by-round accuracy trajectories of models during baseline, BSS, DSS and DBSS experiments.
  • Figure 4: Pairwise influence of models in Baseline, BSS, DBSS, and DSS experiments. Each cell represents a Source model (row) and a Target model (column) and indicates how often the target model flips to match the source’s preceding stance. The flip counts are normalized by column to provide percentages denoting, for each target, what proportion of its flips came from each source.
  • Figure 5: Individual agent sycophancy scores post-experiment, calculated from the final answers at the end of each discussion. Error bars show Wilson 95% confidence intervals. Bold outlines indicate $p < 0.05$ vs. Baseline (two-proportion $z$-test).
  • ...and 10 more figures