Table of Contents
Fetching ...

Cultural Evolution of Cooperation among LLM Agents

Aron Vallinder, Edward Hughes

TL;DR

The paper tackles whether societies of LLM agents can learn cooperative norms through cultural evolution in a stylized Donor Game. It implements a generational framework where agents generate strategies, compete across rounds, and the top performers seed subsequent generations, enabling strategy transmission and indirect reciprocity. Results show strong model dependence: Claude 3.5 Sonnet reliably evolves cooperation, especially with an option for costly punishment, while Gemini 1.5 Flash and GPT-4o fail to sustain cooperation and may even decline. This work offers a scalable, inexpensive benchmark for long-term cooperative behavior in multi-agent LLM ecosystems and informs safe deployment practices for AI agents at scale.

Abstract

Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a "society" of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of cooperation differs markedly across base models, with societies of Claude 3.5 Sonnet agents achieving significantly higher average scores than Gemini 1.5 Flash, which, in turn, outperforms GPT-4o. Further, Claude 3.5 Sonnet can make use of an additional mechanism for costly punishment to achieve yet higher scores, while Gemini 1.5 Flash and GPT-4o fail to do so. For each model class, we also observe variation in emergent behavior across random seeds, suggesting an understudied sensitive dependence on initial conditions. We suggest that our evaluation regime could inspire an inexpensive and informative new class of LLM benchmarks, focussed on the implications of LLM agent deployment for the cooperative infrastructure of society.

Cultural Evolution of Cooperation among LLM Agents

TL;DR

The paper tackles whether societies of LLM agents can learn cooperative norms through cultural evolution in a stylized Donor Game. It implements a generational framework where agents generate strategies, compete across rounds, and the top performers seed subsequent generations, enabling strategy transmission and indirect reciprocity. Results show strong model dependence: Claude 3.5 Sonnet reliably evolves cooperation, especially with an option for costly punishment, while Gemini 1.5 Flash and GPT-4o fail to sustain cooperation and may even decline. This work offers a scalable, inexpensive benchmark for long-term cooperative behavior in multi-agent LLM ecosystems and informs safe deployment practices for AI agents at scale.

Abstract

Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a "society" of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of cooperation differs markedly across base models, with societies of Claude 3.5 Sonnet agents achieving significantly higher average scores than Gemini 1.5 Flash, which, in turn, outperforms GPT-4o. Further, Claude 3.5 Sonnet can make use of an additional mechanism for costly punishment to achieve yet higher scores, while Gemini 1.5 Flash and GPT-4o fail to do so. For each model class, we also observe variation in emergent behavior across random seeds, suggesting an understudied sensitive dependence on initial conditions. We suggest that our evaluation regime could inspire an inexpensive and informative new class of LLM benchmarks, focussed on the implications of LLM agent deployment for the cooperative infrastructure of society.

Paper Structure

This paper contains 11 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Donor Game with Cultural Evolution. In the first generation, 12 agents are initialized via a strategy prompt which asks them to generate a strategy based on a description of the Donor Game. These agents play 12 rounds of the game, using a donation prompt which provides the donor with information about the recipient's past behavior and current resources. The top 50% of agents (in terms of final resources) survive to the next generation. 6 new agents are initialized for that generation, and the strategy prompt includes the strategies of surviving agents. The new generation plays the Donor Game again, and the whole process is repeated for 10 generations.
  • Figure 2: Cultural evolution of cooperation differs across models. We plot the average final resources across all agents ($y$-axis) per generation ($x$-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.
  • Figure 3: Costly punishment affects cooperation differently across models. We plot the average final resources across all agents ($y$-axis) per generation ($x$-axis) as in Figure \ref{['fig:all_baseline']} but with a different $y$-axis scale. Agents now also have the option to punish a recipient by spending $x$ units to take away $2x$ units. For Claude 3.5 Sonnet, average final resources increase substantially, whereas they decrease substantially for Gemini 1.5 Flash. GPT-4o shows some increase, although small in absolute terms.
  • Figure 4: Five runs of each model. We plot the average final resources ($y$-axis) per generation ($x$-axis) for all five individual runs of each model. Note the different $y$-axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a "cooperation crash".
  • Figure 5: Five runs of each model with costly punishment. We plot the average final resources ($y$-axis) per generation ($x$-axis) for all five individual runs of each model with the option of costly punishment. Note the different $y$-axis scales. Relative to the no-punishment condition, a larger number of Claude 3.5 Sonnet runs show substantial improvement with cultural evolution, though there is still large variation. Interestingly, the affordance of costly punishment causes a marked decrease in the resources of Gemini 1.5 Flash agents, since these over-engage in punishment (14.29% of Gemini encounters involved punishment, compared with 1.65% for GPT-4o, and 0.06% for Claude). The availability of costly punishment appears to slightly increase the variance among GPT-4o runs, but there is no sign of emergent cooperation.
  • ...and 6 more figures