Table of Contents
Fetching ...

Computing Optimal Equilibria in Repeated Games with Restarts

Ratip Emin Berker, Vincent Conitzer

TL;DR

The paper addresses cooperation in infinitely repeated two-player games with rematching by introducing restart equilibria that incorporate a hazing period before an infinite goal-value phase. It formalizes the optimal restart sequence problem via limit-utility analysis and a rich equivalence class framework, proving the OptRep problem is (weakly) NP-hard yet solvable in pseudo-polynomial time and amenable to an ILP and an FPTAS. The key contributions include a DP, an ILP, and a scalable FPTAS for computing representative sequences from the optimal class, plus empirical runtimes demonstrating practicality in typical settings. This work enables robust, computable cooperative equilibria in anonymous partner-switching environments, with direct relevance to cooperative AI agents and systems where repeated pairwise interactions occur in large pools.

Abstract

Infinitely repeated games can support cooperative outcomes that are not equilibria in the one-shot game. The idea is to make sure that any gains from deviating will be offset by retaliation in future rounds. However, this model of cooperation fails in anonymous settings with many strategic agents that interact in pairs. Here, a player can defect and then avoid penalization by immediately switching partners. In this paper, we focus on a specific set of equilibria that avoids this pitfall. In them, agents follow a designated sequence of actions, and restart if their opponent ever deviates. We show that the socially-optimal sequence of actions consists of an infinitely repeating goal value, preceded by a hazing period. We introduce an equivalence relation on sequences and prove that the computational problem of finding a representative from the optimal equivalence class is (weakly) NP-hard. Nevertheless, we present a pseudo-polynomial time dynamic program for this problem, as well as an integer linear program, and show they are efficient in practice. Lastly, we introduce a fully polynomial-time approximation scheme that outputs a hazing sequence with arbitrarily small approximation ratio.

Computing Optimal Equilibria in Repeated Games with Restarts

TL;DR

The paper addresses cooperation in infinitely repeated two-player games with rematching by introducing restart equilibria that incorporate a hazing period before an infinite goal-value phase. It formalizes the optimal restart sequence problem via limit-utility analysis and a rich equivalence class framework, proving the OptRep problem is (weakly) NP-hard yet solvable in pseudo-polynomial time and amenable to an ILP and an FPTAS. The key contributions include a DP, an ILP, and a scalable FPTAS for computing representative sequences from the optimal class, plus empirical runtimes demonstrating practicality in typical settings. This work enables robust, computable cooperative equilibria in anonymous partner-switching environments, with direct relevance to cooperative AI agents and systems where repeated pairwise interactions occur in large pools.

Abstract

Infinitely repeated games can support cooperative outcomes that are not equilibria in the one-shot game. The idea is to make sure that any gains from deviating will be offset by retaliation in future rounds. However, this model of cooperation fails in anonymous settings with many strategic agents that interact in pairs. Here, a player can defect and then avoid penalization by immediately switching partners. In this paper, we focus on a specific set of equilibria that avoids this pitfall. In them, agents follow a designated sequence of actions, and restart if their opponent ever deviates. We show that the socially-optimal sequence of actions consists of an infinitely repeating goal value, preceded by a hazing period. We introduce an equivalence relation on sequences and prove that the computational problem of finding a representative from the optimal equivalence class is (weakly) NP-hard. Nevertheless, we present a pseudo-polynomial time dynamic program for this problem, as well as an integer linear program, and show they are efficient in practice. Lastly, we introduce a fully polynomial-time approximation scheme that outputs a hazing sequence with arbitrarily small approximation ratio.
Paper Structure (22 sections, 11 theorems, 21 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 22 sections, 11 theorems, 21 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

For any $G=(\{(p^{(j)},p^{*(j)})\}_{j\in [n]}, \beta)$, if there is any stable sequence, then an optimal sequence exists. This sequence is not necessarily unique.

Figures (2)

  • Figure 1: The semi-log plots of runtimes of Algorithms \ref{['alg:dp']},\ref{['alg:ilp']}, and \ref{['alg:fptas']} (with $\varepsilon=0.3,0.2,$ and $0.1$). Each data point is averaged over 5000 trials. Top Row: Fixed number of actions $(n)$; $x$-axis shows Maximum Payoff from Deviation (MPD). Bottom Row: Fixed MPD; $x$-axis shows $n$.
  • Figure 2: Runtimes of Algorithms \ref{['alg:dp']} and \ref{['alg:ilp']} for a fixed number of actions (30) as a function of Maximum Payoff of Deviation (MDP). Each data point is averaged over 5000 trials.

Theorems & Definitions (28)

  • Definition 1: Optimal sequence
  • Proposition 1
  • proof
  • Claim 1
  • Lemma 1
  • Lemma 2
  • Definition 2: Limit-utility equivalence
  • Proposition 2
  • proof
  • Proposition 3
  • ...and 18 more