LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Maojia Song, Tej Deep Pala, Ruiwen Zhou, Weisheng Jin, Amir Zadeh, Chuan Li, Dorien Herremans, Soujanya Poria
TL;DR
Kairos introduces a socially grounded benchmark to study LLM behavior in multi-agent systems, focusing on rapport, current peer behavior, and self-confidence. The framework enables dynamic data construction and tailored social scenarios to measure accuracy, robustness, utility, and resistance under peer influence. Across prompting, SFT, and GRPO, results show model scale moderates susceptibility to social cues, with GRPO plus MAS context delivering the strongest accuracy gains while maintaining robustness for larger models; smaller models remain vulnerable. A key finding is that MCQ formats can mask conformity effects relative to open-ended tasks, underscoring a persistent challenge: improving accuracy must go hand-in-hand with robust, deception-resistant social reasoning for reliable multi-agent collaboration.
Abstract
Large language models (LLMs) are increasingly integrated into multi-agent systems (MAS), where peer interactions shape individual decisions. While prior work has mainly examined conformity bias, we broaden the view to include how LLMs build rapport from prior interactions, discern and integrate high-quality peer information, and resist misleading inputs-abilities essential for achieving collective intelligence under complex social dynamics. We introduce KAIROS, a benchmark that simulates quiz-style collaboration with peer agents whose rapport levels and behaviours can be precisely controlled in both historical interactions and the current round. This unified setup enables systematic analysis of how rapport, peer actions, and the model's self-confidence jointly influence decision-making. Using KAIROS, we evaluate prompting, supervised fine-tuning, and reinforcement learning via Group Relative Policy Optimisation (GRPO). Results show that model scale is a primary factor moderating susceptibility to social influence: larger models are more resilient and benefit from prompting-based mitigation, whereas smaller models remain vulnerable. Only carefully configured GRPO training yields consistent robustness and performance gains for small models.
