Efficient Stackelberg Strategies for Finitely Repeated Games
Natalie Collina, Eshwar Ram Arunachaleswaran, Michael Kearns
TL;DR
The paper addresses the computation of Stackelberg equilibria in finitely repeated, non-discounted two-player games by allowing the Leader to commit to an algorithm (a GPA) that can react to history. It introduces an LP-based upper bound on the Leader’s average payoff and constructs concrete GPAs that realize near-optimal transcripts, using threat strategies to enforce follower compliance. Two efficient algorithms are developed: a deterministic LP-based approach with a $O\left(\frac{1}{T}\right)$ rate (with exponential dependence on action count), and a randomized approach with a $O\left(T^{-0.25}\right)$ rate that eliminates dependence on the number of actions, both achieving polynomial-time performance. A hardness result for three-player finitely repeated games shows that approximating the Stackelberg value becomes NP-hard, even with repetition, via a reduction from BALANCED-VC. Collectively, the work delineates the computational feasibility and limitations of computing Stackelberg GPAs in finite-horizon settings and highlights a separation between repeated and single-shot Stackelberg performance in canonical examples like the Prisoner’s Dilemma.
Abstract
We study Stackelberg equilibria in finitely repeated games, where the leader commits to a strategy that picks actions in each round and can be adaptive to the history of play (i.e. they commit to an algorithm). In particular, we study static repeated games with no discounting. We give efficient algorithms for finding approximate Stackelberg equilibria in this setting, along with rates of convergence depending on the time horizon $T$. In many cases, these algorithms allow the leader to do much better on average than they can in the single-round Stackelberg. We give two algorithms, one computing strategies with an optimal $\frac{1}{T}$ rate at the expense of an exponential dependence on the number of actions, and another (randomized) approach computing strategies with no dependence on the number of actions but a worse dependence on $T$ of $\frac{1}{T^{0.25}}$. Both algorithms build upon a linear program to produce simple automata leader strategies and induce corresponding automata best-responses for the follower. We complement these results by showing that approximating the Stackelberg value in three-player finite-horizon repeated games is a computationally hard problem via a reduction from balanced vertex cover.
