Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?
Nicoló Fontana, Francesco Pierri, Luca Maria Aiello
TL;DR
This study probes how three large language models behave as social agents in the Iterated Prisoner's Dilemma, introducing a principled framework that combines meta-prompting, memory-window control, and SFEM-based behavioral profiling. By simulating 100 rounds per game and repeating trials, it quantifies cooperation patterns and strategy alignment across Llama2, Llama3, and GPT-3.5 against hostile opponents, uncovering model-specific tendencies: Llama2 and GPT-3.5 are generally more cooperative than humans, while Llama3 exhibits more exploitative, human-like behavior. The findings underscore the importance of prompt design, memory representation, and experimental duration for reliable LLM auditing and alignment in social contexts. This work provides a reproducible baseline for IPD with LLMs and motivates future cross-model and group-agent studies in synthetic social ecosystems.
Abstract
The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.
