Table of Contents
Fetching ...

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Zengqing Wu, Run Peng, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Chuan Xiao

TL;DR

This work probes whether competing LLM agents can spontaneously cooperate in social simulations without explicit prompts. Using a minimal, debiased Smart Agent-Based Modeling framework across three benchmark scenarios (Keynesian Beauty Contest, Bertrand competition, and Emergency Evacuation), the authors show that cooperation can emerge gradually through in-context learning and historical interactions, aligning with human data in at least one domain. They perform ablations and cross-model analyses to argue that the observed cooperation is not merely instruction-driven and discuss implications for computational social science and AI evaluation of deliberate reasoning. While acknowledging limitations such as dataset breadth and model variety, the study provides a conceptual and methodological path for evaluating LLMs’ autonomous cooperative capabilities in long-horizon tasks.

Abstract

Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of shaping agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions. We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation, findings that align closely with human behavioral data. This approach not only aids the computational social science community in bridging the gap between simulations and real-world dynamics but also offers the AI community a novel method to assess LLMs' capability of deliberate reasoning.

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

TL;DR

This work probes whether competing LLM agents can spontaneously cooperate in social simulations without explicit prompts. Using a minimal, debiased Smart Agent-Based Modeling framework across three benchmark scenarios (Keynesian Beauty Contest, Bertrand competition, and Emergency Evacuation), the authors show that cooperation can emerge gradually through in-context learning and historical interactions, aligning with human data in at least one domain. They perform ablations and cross-model analyses to argue that the observed cooperation is not merely instruction-driven and discuss implications for computational social science and AI evaluation of deliberate reasoning. While acknowledging limitations such as dataset breadth and model variety, the study provides a conceptual and methodological path for evaluating LLMs’ autonomous cooperative capabilities in long-horizon tasks.

Abstract

Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of shaping agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions. We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation, findings that align closely with human behavioral data. This approach not only aids the computational social science community in bridging the gap between simulations and real-world dynamics but also offers the AI community a novel method to assess LLMs' capability of deliberate reasoning.
Paper Structure (62 sections, 20 figures, 4 tables)

This paper contains 62 sections, 20 figures, 4 tables.

Figures (20)

  • Figure 1: (Depicted by GPT-4o) Two potential scenarios during a fire. People might panic and rush into crowds, trying to exit first (left) or may stay calm, keep in line, and encourage others (right). In this study, we explore whether LLM agents can simulate the gradual transition from non-cooperative to cooperative behaviors of agents.
  • Figure 2: Workflow in the three case studies, illustrating how our framework manages LLM agents during simulations. From left to right, the workflow progresses through the communication phase, planning phase, action phase, and update phase. In BC, the order of the communication and planning phases is swapped to align with previous simulations that used human subjects andres2023communication. The first three phases involve one or more LLM queries initiated by the framework. The final phase does not involve LLM queries but updates the state for each scenario.
  • Figure 3: Illustration of baseline design in KBC case study. Agents go through $k$ rounds of communication before planning and choosing their numbers.
  • Figure 4: Variance of player choices under different KBC settings. In our baseline setting (curve in blue), we use the GPT-4-0314 model with a temperature of 0.7, without explicit instructions or personas.
  • Figure 5: Distribution of players' choices in our simulations and the results of the New York Times experiment.
  • ...and 15 more figures