Evaluating LLMs in Open-Source Games

Swadesh Sistla; Max Kleiman-Weiner

Evaluating LLMs in Open-Source Games

Swadesh Sistla, Max Kleiman-Weiner

TL;DR

The paper studies how LLM agents operating in open-source games can develop cooperative, deceptive, or payoff-driven strategies when their actions are governed by transparent, submitted programs. It introduces the SPARC benchmark to test strategic code understanding and conducts dyadic experiments with varying agent objectives in IPD and Coin Game, revealing how agents adapt and approximate program equilibria. Through evolutionary analyses using replicator dynamics, the work demonstrates environment-dependent stability of cooperative and deceptive strategies. Together, these results suggest open-source game frameworks as a promising avenue for steering multi-agent safety and governance in strategic settings.

Abstract

Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.

Evaluating LLMs in Open-Source Games

TL;DR

Abstract

Evaluating LLMs in Open-Source Games

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)