Playing games with Large language models: Randomness and strategy

Alicia Vidler; Toby Walsh

Playing games with Large language models: Randomness and strategy

Alicia Vidler, Toby Walsh

TL;DR

This study probes whether large language models can meaningfully participate in strategic, multi-agent games and how randomness and strategy emerge in such settings. Using a LangChain-enabled framework and GPT-4o-Mini-2024-08-17, it evaluates one-shot and repeated Rock-Paper-Scissors and Prisoner’s Dilemma interactions, revealing systematic biases in randomness and notable loss-averse tendencies over time. The findings show LLMs struggle to produce truly uniform random actions, with RPS converging toward stalemates in repetition and PD outcomes shifting with prompt design, rather than reliably achieving game-theoretic equilibria. These results have implications for deploying multi-agent LLM systems and highlight practical challenges in prompt design, caching, and independent sampling when modeling strategic decision-making.

Abstract

Playing games has a long history of describing intricate interactions in simplified forms. In this paper we explore if large language models (LLMs) can play games, investigating their capabilities for randomisation and strategic adaptation through both simultaneous and sequential game interactions. We focus on GPT-4o-Mini-2024-08-17 and test two games between LLMs: Rock Paper Scissors (RPS) and games of strategy (Prisoners Dilemma PD). LLMs are often described as stochastic parrots, and while they may indeed be parrots, our results suggest that they are not very stochastic in the sense that their outputs - when prompted to be random - are often very biased. Our research reveals that LLMs appear to develop loss aversion strategies in repeated games, with RPS converging to stalemate conditions while PD shows systematic shifts between cooperative and competitive outcomes based on prompt design. We detail programmatic tools for independent agent interactions and the Agentic AI challenges faced in implementation. We show that LLMs can indeed play games, just not very well. These results have implications for the use of LLMs in multi-agent LLM systems and showcase limitations in current approaches to model output for strategic decision-making.

Playing games with Large language models: Randomness and strategy

TL;DR

Abstract

Playing games with Large language models: Randomness and strategy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)