Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing
Nunzio Lorè, Babak Heydari
TL;DR
New study assesses whether large language models (GPT-3.5, GPT-4, LLaMa-2) can engage in strategic decision-making in game-theoretic social dilemmas. It systematically varies game structure (Prisoner's Dilemma, Stag Hunt, Snowdrift, Prisoner's Delight) and contextual framing (IR, biz, environment, team, friendsharing) and analyzes outcomes across 60 scenarios with 300 initializations each. The results show GPT-3.5 is highly context-sensitive but lacks abstract reasoning; GPT-4 and LLaMa-2 balance structure and context, with LLaMa-2 showing finer-grained game discrimination and GPT-4 showing more binary, structure-driven behavior. A dominant-dominance analysis reveals friendsharing as the most influential context, and the work highlights limitations and framing risks in deploying LLMs for strategic tasks. Overall, the paper cautions against unqualified use of LLMs in strategic reasoning and points to directions for improving contextual robustness and understanding of decision-making mechanisms.
Abstract
This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2, within the framework of game theory. Utilizing four canonical two-player games -- Prisoner's Dilemma, Stag Hunt, Snowdrift, and Prisoner's Delight -- we explore how these models navigate social dilemmas, situations where players can either cooperate for a collective benefit or defect for individual gain. Crucially, we extend our analysis to examine the role of contextual framing, such as diplomatic relations or casual friendships, in shaping the models' decisions. Our findings reveal a complex landscape: while GPT-3.5 is highly sensitive to contextual framing, it shows limited ability to engage in abstract strategic reasoning. Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and context, but LLaMa-2 exhibits a more nuanced understanding of the games' underlying mechanics. These results highlight the current limitations and varied proficiencies of LLMs in strategic decision-making, cautioning against their unqualified use in tasks requiring complex strategic reasoning.
