Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic
Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire, Eliezer Yudkowsky
TL;DR
The work formalizes robust cooperation in a one-shot Prisoner's Dilemma with access to opponents' code by modeling agents as modal statements in Gödel–Löb provability logic. It introduces FairBot and PrudentBot within a general modal-agent framework, showing that provability-based reasoning yields unexploitable mutual cooperation via fixed points and Löb's theorem, even when agents differ in encoding. The paper analyzes the structural properties of modal agents, proves fixed-point behavior for interactions, and discusses fundamental obstacles to notionally optimal strategies and the philosophical implications of such reasoning. It also outlines practical limitations, including the artificiality of exchanging source code and open questions for extending the approach to richer strategic settings.
Abstract
We consider the one-shot Prisoner's Dilemma between algorithms with read-access to one anothers' source codes, and we use the modal logic of provability to build agents that can achieve mutual cooperation in a manner that is robust, in that cooperation does not require exact equality of the agents' source code, and unexploitable, meaning that such an agent never cooperates when its opponent defects. We construct a general framework for such "modal agents", and study their properties.
