Table of Contents
Fetching ...

Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?

Nicoló Fontana, Francesco Pierri, Luca Maria Aiello

TL;DR

This study probes how three large language models behave as social agents in the Iterated Prisoner's Dilemma, introducing a principled framework that combines meta-prompting, memory-window control, and SFEM-based behavioral profiling. By simulating 100 rounds per game and repeating trials, it quantifies cooperation patterns and strategy alignment across Llama2, Llama3, and GPT-3.5 against hostile opponents, uncovering model-specific tendencies: Llama2 and GPT-3.5 are generally more cooperative than humans, while Llama3 exhibits more exploitative, human-like behavior. The findings underscore the importance of prompt design, memory representation, and experimental duration for reliable LLM auditing and alignment in social contexts. This work provides a reproducible baseline for IPD with LLMs and motivates future cross-model and group-agent studies in synthetic social ecosystems.

Abstract

The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.

Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?

TL;DR

This study probes how three large language models behave as social agents in the Iterated Prisoner's Dilemma, introducing a principled framework that combines meta-prompting, memory-window control, and SFEM-based behavioral profiling. By simulating 100 rounds per game and repeating trials, it quantifies cooperation patterns and strategy alignment across Llama2, Llama3, and GPT-3.5 against hostile opponents, uncovering model-specific tendencies: Llama2 and GPT-3.5 are generally more cooperative than humans, while Llama3 exhibits more exploitative, human-like behavior. The findings underscore the importance of prompt design, memory representation, and experimental duration for reliable LLM auditing and alignment in social contexts. This work provides a reproducible baseline for IPD with LLMs and motivates future cross-model and group-agent studies in synthetic social ecosystems.

Abstract

The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.
Paper Structure (21 sections, 3 equations, 14 figures, 1 table)

This paper contains 21 sections, 3 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Accuracy of the models' responses to the prompt comprehension questions defined in Table \ref{['tab:meta-prompting']}. The questions are categorized into three groups, each assessing different aspects: the rules of the game, its temporal evolution, and its current state. We show 95% confidence intervals, computed from 100 games.
  • Figure 2: Left: Llama2's probability of cooperation ($p_{coop}$) against an $\mathsf{Always\ Defect}$ opponent, when using a memory window size of 10 vs. including the full game history in the prompt. Right: steady-state probability calculated on the last 10 rounds of $p_{coop}$ for different memory windows sizes. We show 95% confidence intervals, computed from 100 games.
  • Figure 3: Models' probability of cooperation ($p_{coop}$) against $\mathsf{Unfair\ Random}$ opponents with increasing cooperation probability $\alpha$. We show 95% confidence intervals, computed from 100 games.
  • Figure 4: SFEM scores quantifying the similarity between the models' sequences of actions and known strategies adopted in the Iterated Prisoner's Dilemma game (defined in §\ref{['sec:background:strategies']}). The models' behavioral sequences come from games against $\mathsf{Unfair\ Random}$ opponents with increasing cooperation probability $\alpha$. Some SFEM scores are not shown because not well-defined for extreme values of $\alpha$.
  • Figure 5: Presence of behavioral traits in the models' actions when playing against $\mathsf{Unfair\ Random}$ opponents with increasing cooperation probability $\alpha$. The values of the same traits calculated for a $\mathsf{Random}$ agent playing against $\mathsf{Unfair\ Random}$ opponents are reported. Some traits are not defined for extreme values of $\alpha$. We show 95% confidence intervals, computed from 100 games.
  • ...and 9 more figures