Table of Contents
Fetching ...

Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic

Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire, Eliezer Yudkowsky

TL;DR

The work formalizes robust cooperation in a one-shot Prisoner's Dilemma with access to opponents' code by modeling agents as modal statements in Gödel–Löb provability logic. It introduces FairBot and PrudentBot within a general modal-agent framework, showing that provability-based reasoning yields unexploitable mutual cooperation via fixed points and Löb's theorem, even when agents differ in encoding. The paper analyzes the structural properties of modal agents, proves fixed-point behavior for interactions, and discusses fundamental obstacles to notionally optimal strategies and the philosophical implications of such reasoning. It also outlines practical limitations, including the artificiality of exchanging source code and open questions for extending the approach to richer strategic settings.

Abstract

We consider the one-shot Prisoner's Dilemma between algorithms with read-access to one anothers' source codes, and we use the modal logic of provability to build agents that can achieve mutual cooperation in a manner that is robust, in that cooperation does not require exact equality of the agents' source code, and unexploitable, meaning that such an agent never cooperates when its opponent defects. We construct a general framework for such "modal agents", and study their properties.

Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic

TL;DR

The work formalizes robust cooperation in a one-shot Prisoner's Dilemma with access to opponents' code by modeling agents as modal statements in Gödel–Löb provability logic. It introduces FairBot and PrudentBot within a general modal-agent framework, showing that provability-based reasoning yields unexploitable mutual cooperation via fixed points and Löb's theorem, even when agents differ in encoding. The paper analyzes the structural properties of modal agents, proves fixed-point behavior for interactions, and discusses fundamental obstacles to notionally optimal strategies and the philosophical implications of such reasoning. It also outlines practical limitations, including the artificiality of exchanging source code and open questions for extending the approach to richer strategic settings.

Abstract

We consider the one-shot Prisoner's Dilemma between algorithms with read-access to one anothers' source codes, and we use the modal logic of provability to build agents that can achieve mutual cooperation in a manner that is robust, in that cooperation does not require exact equality of the agents' source code, and unexploitable, meaning that such an agent never cooperates when its opponent defects. We construct a general framework for such "modal agents", and study their properties.

Paper Structure

This paper contains 7 sections, 13 theorems, 16 equations, 6 algorithms.

Key Result

Theorem 1.1

Let S be a formal system which includes Peano Arithmetic. If $\phi$ is any well-formed formula in S, let $\Box \phi$ be the formula in a Gödel encoding of S which claims that there exists a proof of $\phi$ in S; then whenever $\textsf{S}\vdash (\Box \phi \to\phi)$, in fact $\textsf{S}\vdash \phi$.

Theorems & Definitions (26)

  • Theorem 1.1: Löb's Theorem
  • Theorem 3.1
  • proof : Proof (Simple Version):
  • proof : Proof of Theorem \ref{['FBFB']} (Real Version):
  • Theorem 3.2
  • proof
  • Theorem 4.1: Arithmetic soundness of GL
  • proof
  • Theorem 4.2: Modal fixed point theorem
  • proof
  • ...and 16 more