Table of Contents
Fetching ...

Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions

Karthik Sreedhar, Alice Cai, Jenny Ma, Jeffrey V. Nickerson, Lydia B. Chilton

TL;DR

The paper investigates whether multi-agent LLM systems can faithfully simulate human prosocial behavior, particularly in the public goods game (PGG), and whether these simulations can reveal unbounded actions relevant to real-world policy. It demonstrates that LLM-agent ensembles replicate the direction of lab effects for priming, transparency, and endowment variation, and can transfer priming mechanisms from non-PGG contexts to PGG simulations, though magnitudes may differ. Beyond bounded lab tasks, the study shows unbounded behaviors emerge in in-the-wild-like scenarios when additional mechanisms—such as private communication channels and stake-prompting—are introduced. While imperfect and evolving models caution against overreliance, these simulations offer a flexible, low-cost tool for ideation and exploration of policy outcomes, complementing traditional human-in-the-loop experiments.

Abstract

Human prosocial cooperation is essential for our collective health, education, and welfare. However, designing social systems to maintain or incentivize prosocial behavior is challenging because people can act selfishly to maximize personal gain. This complex and unpredictable aspect of human behavior makes it difficult for policymakers to foresee the implications of their designs. Recently, multi-agent LLM systems have shown remarkable capabilities in simulating human-like behavior, and replicating some human lab experiments. This paper studies how well multi-agent systems can simulate prosocial human behavior, such as that seen in the public goods game (PGG), and whether multi-agent systems can exhibit ``unbounded actions'' seen outside the lab in real world scenarios. We find that multi-agent LLM systems successfully replicate human behavior from lab experiments of the public goods game with three experimental treatments - priming, transparency, and varying endowments. Beyond replicating existing experiments, we find that multi-agent LLM systems can replicate the expected human behavior when combining experimental treatments, even if no previous study combined those specific treatments. Lastly, we find that multi-agent systems can exhibit a rich set of unbounded actions that people do in the real world outside of the lab -- such as collaborating and even cheating. In sum, these studies are steps towards a future where LLMs can be used to inform policy decisions that encourage people to act in a prosocial manner.

Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions

TL;DR

The paper investigates whether multi-agent LLM systems can faithfully simulate human prosocial behavior, particularly in the public goods game (PGG), and whether these simulations can reveal unbounded actions relevant to real-world policy. It demonstrates that LLM-agent ensembles replicate the direction of lab effects for priming, transparency, and endowment variation, and can transfer priming mechanisms from non-PGG contexts to PGG simulations, though magnitudes may differ. Beyond bounded lab tasks, the study shows unbounded behaviors emerge in in-the-wild-like scenarios when additional mechanisms—such as private communication channels and stake-prompting—are introduced. While imperfect and evolving models caution against overreliance, these simulations offer a flexible, low-cost tool for ideation and exploration of policy outcomes, complementing traditional human-in-the-loop experiments.

Abstract

Human prosocial cooperation is essential for our collective health, education, and welfare. However, designing social systems to maintain or incentivize prosocial behavior is challenging because people can act selfishly to maximize personal gain. This complex and unpredictable aspect of human behavior makes it difficult for policymakers to foresee the implications of their designs. Recently, multi-agent LLM systems have shown remarkable capabilities in simulating human-like behavior, and replicating some human lab experiments. This paper studies how well multi-agent systems can simulate prosocial human behavior, such as that seen in the public goods game (PGG), and whether multi-agent systems can exhibit ``unbounded actions'' seen outside the lab in real world scenarios. We find that multi-agent LLM systems successfully replicate human behavior from lab experiments of the public goods game with three experimental treatments - priming, transparency, and varying endowments. Beyond replicating existing experiments, we find that multi-agent LLM systems can replicate the expected human behavior when combining experimental treatments, even if no previous study combined those specific treatments. Lastly, we find that multi-agent systems can exhibit a rich set of unbounded actions that people do in the real world outside of the lab -- such as collaborating and even cheating. In sum, these studies are steps towards a future where LLMs can be used to inform policy decisions that encourage people to act in a prosocial manner.

Paper Structure

This paper contains 47 sections, 7 figures.

Figures (7)

  • Figure 1: Average contributions for "Teamwork" and "Taxation" priming conditions in simulations with LLM-agents and experiments with human subjects. Average contributions are above 60% in the "Teamwork" priming condition for both groups. Average contributions are below 40% in the "Taxation" priming condition for both groups.
  • Figure 2: Average contributions with and without transparency of contributions for simulations. Experiments with humans show that average contributions in PGGs with transparency of contributions is 6% higher than in PGGs without without transparency. In simulations with LLM-agents, average contributions in PGGs with transparency of contributions were 25% higher than in PGGs without transparency. So, the direction of the difference is accurately captured.
  • Figure 3: Average contributions for equal and varied endowment conditions in simulations with LLM-agents and experiments with human subjects. Average contributions are roughly the same for humans and LLM-agents in the equal and varied endowment conditions with $20 and $50 endowments. Average contributions are lower in the varied condition than in the fixed condition with $80 endowments for both humans and LLM-agents.
  • Figure 4: Average contributions for either "unity" or "proportionality" priming conditions. Average contributions under the "unity" condition are significantly higher than that under the "proportionality" condition.
  • Figure 5: Average contributions in the all five rounds rounds for either "Teamwork" and "Taxation" priming conditions. Average contributions in the fifth round are closer to 50% of the initial endowment, the average amount contributed without any priming labexperiments. Hence, the effects of priming appears to fade over time in simulations with LLM-agents.
  • ...and 2 more figures