Computing Optimal Equilibria in Repeated Games with Restarts
Ratip Emin Berker, Vincent Conitzer
TL;DR
The paper addresses cooperation in infinitely repeated two-player games with rematching by introducing restart equilibria that incorporate a hazing period before an infinite goal-value phase. It formalizes the optimal restart sequence problem via limit-utility analysis and a rich equivalence class framework, proving the OptRep problem is (weakly) NP-hard yet solvable in pseudo-polynomial time and amenable to an ILP and an FPTAS. The key contributions include a DP, an ILP, and a scalable FPTAS for computing representative sequences from the optimal class, plus empirical runtimes demonstrating practicality in typical settings. This work enables robust, computable cooperative equilibria in anonymous partner-switching environments, with direct relevance to cooperative AI agents and systems where repeated pairwise interactions occur in large pools.
Abstract
Infinitely repeated games can support cooperative outcomes that are not equilibria in the one-shot game. The idea is to make sure that any gains from deviating will be offset by retaliation in future rounds. However, this model of cooperation fails in anonymous settings with many strategic agents that interact in pairs. Here, a player can defect and then avoid penalization by immediately switching partners. In this paper, we focus on a specific set of equilibria that avoids this pitfall. In them, agents follow a designated sequence of actions, and restart if their opponent ever deviates. We show that the socially-optimal sequence of actions consists of an infinitely repeating goal value, preceded by a hazing period. We introduce an equivalence relation on sequences and prove that the computational problem of finding a representative from the optimal equivalence class is (weakly) NP-hard. Nevertheless, we present a pseudo-polynomial time dynamic program for this problem, as well as an integer linear program, and show they are efficient in practice. Lastly, we introduce a fully polynomial-time approximation scheme that outputs a hazing sequence with arbitrarily small approximation ratio.
