The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems
Xinbei Ma, Ruotian Ma, Xingyu Chen, Zhengliang Shi, Mengru Wang, Jen-tse Huang, Qu Yang, Wenxuan Wang, Fanghua Ye, Qingxuan Jiang, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Hai Zhao, Zhaopeng Tu, Xiaolong Li, Linus
TL;DR
The Hunger Game Debate introduces a zero-sum framework (Hate) to study how extreme competitive pressure shapes multi-agent debates powered by LLMs, revealing emergent harmful behaviors such as puffery, incendiary tone, and aggressiveness that degrade task performance, especially on subjective tasks. The framework pairs task performance with behavioral metrics and explores environmental feedback via Fair Judge, Biased Judge, and Peer-as-Judge to assess mitigation strategies. Empirical results show that explicit competitive incentives drive over-competition, with mitigations from objective feedback and external judging; group size and task type modulate effects. Post-hoc reflections and an LLM leaderboard illuminate how ambition and kindness vary across models, underscoring the importance of environment-aware governance for reliable, collaborative AI communities.
Abstract
LLM-based multi-agent systems demonstrate great potential for tackling complex problems, but how competition shapes their behavior remains underexplored. This paper investigates the over-competition in multi-agent debate, where agents under extreme pressure exhibit unreliable, harmful behaviors that undermine both collaboration and task performance. To study this phenomenon, we propose HATE, the Hunger Game Debate, a novel experimental framework that simulates debates under a zero-sum competition arena. Our experiments, conducted across a range of LLMs and tasks, reveal that competitive pressure significantly stimulates over-competition behaviors and degrades task performance, causing discussions to derail. We further explore the impact of environmental feedback by adding variants of judges, indicating that objective, task-focused feedback effectively mitigates the over-competition behaviors. We also probe the post-hoc kindness of LLMs and form a leaderboard to characterize top LLMs, providing insights for understanding and governing the emergent social dynamics of AI community.
