Table of Contents
Fetching ...

Evaluating LLMs in Open-Source Games

Swadesh Sistla, Max Kleiman-Weiner

TL;DR

The paper studies how LLM agents operating in open-source games can develop cooperative, deceptive, or payoff-driven strategies when their actions are governed by transparent, submitted programs. It introduces the SPARC benchmark to test strategic code understanding and conducts dyadic experiments with varying agent objectives in IPD and Coin Game, revealing how agents adapt and approximate program equilibria. Through evolutionary analyses using replicator dynamics, the work demonstrates environment-dependent stability of cooperative and deceptive strategies. Together, these results suggest open-source game frameworks as a promising avenue for steering multi-agent safety and governance in strategic settings.

Abstract

Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.

Evaluating LLMs in Open-Source Games

TL;DR

The paper studies how LLM agents operating in open-source games can develop cooperative, deceptive, or payoff-driven strategies when their actions are governed by transparent, submitted programs. It introduces the SPARC benchmark to test strategic code understanding and conducts dyadic experiments with varying agent objectives in IPD and Coin Game, revealing how agents adapt and approximate program equilibria. Through evolutionary analyses using replicator dynamics, the work demonstrates environment-dependent stability of cooperative and deceptive strategies. Together, these results suggest open-source game frameworks as a promising avenue for steering multi-agent safety and governance in strategic settings.

Abstract

Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.

Paper Structure

This paper contains 42 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: High-level structure of a repeated open-source game with two players. (1) Each player, an LLM agent in this work, submits a strategy represented in Python code to play the base game e.g., Iterated Prisoner's Dilemma or Coin Game. (2) Programs read and analyze the other player's strategy and can condition their behavior on that analysis. (3) Programs choose moves in the base game on behalf of the player. (4) Players observe the payoffs and can rewrite their code in the next meta-round.
  • Figure 2: Example snippets from the SPARC benchmark. (left) The Tit-for-Tat strategy is implemented in a short Python script. (right) The same Tit-for-Tat snippet after obfuscation.
  • Figure 3: Cooperative program understanding Grey dots show individual LLM performance. (left) Chain-of-thought improves classification accuracy for all models (p<0.01; t-test). (middle) Stochastic programs were less likely to be correctly classified (p<0.001; t-test). (right) Masking (removing the name of the program) and Obfuscation (renaming all variable names to random strings) had only a minor impact on LLM prediction accuracy. Error bars show the standard errors of the mean.
  • Figure 4: Payoffs across agent type pairings. Average payoffs for each actor type (CPM, DPM, PM) when playing against different opponents across (left) Coin Game and (right) Iterated Prisoner's Dilemma. DPM agents fail to substantially outperform their opponents despite explicit deceptive objectives. Error bars show standard error across 10 independent runs.
  • Figure 5: Strategic Features of Programs in Open-Source Games Bars show the average percentage of strategic adaptations across all 10 meta-rounds for the different agent pairings. Differing agent types exhibit divergent strategic profiles. Cooperative Payoff Maximization (CPM) agents heavily favor "Counter Measures" and are the primary users of "Direct Imitation". Deceptive Payoff Maximization (DPM) agents show the highest rates of "Exploitation Attempts" and are the only agents to use "Feints." Payoff Maximization (PM) agents are opportunistic, balancing exploitation and defense. These results are similar across the (top) Coin Game and (bottom) IPD. Error bars show the standard error of the mean.
  • ...and 2 more figures