The Subtle Art of Defection: Understanding Uncooperative Behaviors in LLM based Multi-Agent Systems
Devang Kulshreshtha, Wanyu Du, Raghav Jain, Srikanth Doss, Hang Su, Sandesh Swamy, Yanjun Qi
TL;DR
The paper addresses the vulnerability of LLM-based multi-agent systems to uncooperative behaviors in shared-resource settings. It introduces a game-theory–inspired taxonomy of six uncooperative strategies and the GVSR (Generate, Verify, Score, Refine) pipeline to synthesize adaptive, multi-turn plans that simulate such behaviors. Empirical results show cooperative agents sustain 100% survival with 0% overuse across 12 rounds, while any uncooperative strategy can trigger collapse within 1–7 rounds and induce substantial overuse and inequality; ablations reveal GVSR components are essential for stronger destabilization and stress-testing across three environments. The work demonstrates the need for designing resilient multi-agent systems and provides a structured evaluation framework for robustness against sophisticated uncooperative behaviors. These insights have practical implications for enterprise deployments of autonomous AI systems and guide future mitigation and mitigation-evaluation research.
Abstract
This paper introduces a novel framework for simulating and analyzing how uncooperative behaviors can destabilize or collapse LLM-based multi-agent systems. Our framework includes two key components: (1) a game theory-based taxonomy of uncooperative agent behaviors, addressing a notable gap in the existing literature; and (2) a structured, multi-stage simulation pipeline that dynamically generates and refines uncooperative behaviors as agents' states evolve. We evaluate the framework via a collaborative resource management setting, measuring system stability using metrics such as survival time and resource overuse rate. Empirically, our framework achieves 96.7% accuracy in generating realistic uncooperative behaviors, validated by human evaluations. Our results reveal a striking contrast: cooperative agents maintain perfect system stability (100% survival over 12 rounds with 0% resource overuse), while any uncooperative behavior can trigger rapid system collapse within 1 to 7 rounds. These findings demonstrate that uncooperative agents can significantly degrade collective outcomes, highlighting the need for designing more resilient multi-agent systems.
