Efficient Stackelberg Strategies for Finitely Repeated Games

Natalie Collina; Eshwar Ram Arunachaleswaran; Michael Kearns

Efficient Stackelberg Strategies for Finitely Repeated Games

Natalie Collina, Eshwar Ram Arunachaleswaran, Michael Kearns

TL;DR

The paper addresses the computation of Stackelberg equilibria in finitely repeated, non-discounted two-player games by allowing the Leader to commit to an algorithm (a GPA) that can react to history. It introduces an LP-based upper bound on the Leader’s average payoff and constructs concrete GPAs that realize near-optimal transcripts, using threat strategies to enforce follower compliance. Two efficient algorithms are developed: a deterministic LP-based approach with a $O\left(\frac{1}{T}\right)$ rate (with exponential dependence on action count), and a randomized approach with a $O\left(T^{-0.25}\right)$ rate that eliminates dependence on the number of actions, both achieving polynomial-time performance. A hardness result for three-player finitely repeated games shows that approximating the Stackelberg value becomes NP-hard, even with repetition, via a reduction from BALANCED-VC. Collectively, the work delineates the computational feasibility and limitations of computing Stackelberg GPAs in finite-horizon settings and highlights a separation between repeated and single-shot Stackelberg performance in canonical examples like the Prisoner’s Dilemma.

Abstract

We study Stackelberg equilibria in finitely repeated games, where the leader commits to a strategy that picks actions in each round and can be adaptive to the history of play (i.e. they commit to an algorithm). In particular, we study static repeated games with no discounting. We give efficient algorithms for finding approximate Stackelberg equilibria in this setting, along with rates of convergence depending on the time horizon $T$. In many cases, these algorithms allow the leader to do much better on average than they can in the single-round Stackelberg. We give two algorithms, one computing strategies with an optimal $\frac{1}{T}$ rate at the expense of an exponential dependence on the number of actions, and another (randomized) approach computing strategies with no dependence on the number of actions but a worse dependence on $T$ of $\frac{1}{T^{0.25}}$. Both algorithms build upon a linear program to produce simple automata leader strategies and induce corresponding automata best-responses for the follower. We complement these results by showing that approximating the Stackelberg value in three-player finite-horizon repeated games is a computationally hard problem via a reduction from balanced vertex cover.

Efficient Stackelberg Strategies for Finitely Repeated Games

TL;DR

rate (with exponential dependence on action count), and a randomized approach with a

rate that eliminates dependence on the number of actions, both achieving polynomial-time performance. A hardness result for three-player finitely repeated games shows that approximating the Stackelberg value becomes NP-hard, even with repetition, via a reduction from BALANCED-VC. Collectively, the work delineates the computational feasibility and limitations of computing Stackelberg GPAs in finite-horizon settings and highlights a separation between repeated and single-shot Stackelberg performance in canonical examples like the Prisoner’s Dilemma.

Abstract

. In many cases, these algorithms allow the leader to do much better on average than they can in the single-round Stackelberg. We give two algorithms, one computing strategies with an optimal

rate at the expense of an exponential dependence on the number of actions, and another (randomized) approach computing strategies with no dependence on the number of actions but a worse dependence on

. Both algorithms build upon a linear program to produce simple automata leader strategies and induce corresponding automata best-responses for the follower. We complement these results by showing that approximating the Stackelberg value in three-player finite-horizon repeated games is a computationally hard problem via a reduction from balanced vertex cover.

Paper Structure (30 sections, 21 theorems, 112 equations, 1 table)

This paper contains 30 sections, 21 theorems, 112 equations, 1 table.

Introduction
Our Techniques
Related Work
Notation and Preliminaries
Separation of Repeated SE from Single-Round SE
Algorithms for Approximate Stackelberg Equilibrium
An LP Upper Bound
Construction of a Stackelberg $\text{GPA}$ from the LP
Randomization for Faster Convergence
Hardness of Computing An Approximate Stackelberg GPA in 3-Player Games
Supplementary Related Work
Proofs from Section \ref{['sec:prelims']}
Proof of Lemma \ref{['lemma:bre']}
Proof of Theorem \ref{['thm:compact_existence']}
Proofs from Section \ref{['sec:separation']}
...and 15 more sections

Key Result

Lemma 1

For any Leader $\text{GPA}$ by Player 1, there exists a best response $\text{Game Playing Algorithm}$ by Player 2 within the set of deterministic lookup table $\text{GPA}$s, which is a finite set. Therefore, a best response is well-defined in the $\text{GPA}$ space.

Theorems & Definitions (42)

Definition 1: Bimatrix Games
Definition 2: Nash Equilibria in Single-Shot Games
Definition 3: Stackelberg Equilibria in Single-Shot Games
Definition 4: Repeated Bimatrix Game
Definition 5: $\text{Game Playing Algorithm}$
Lemma 1
Theorem 1
Definition 6: Approximate Stackelberg $\text{GPA}$
Lemma 2
Theorem 2
...and 32 more

Efficient Stackelberg Strategies for Finitely Repeated Games

TL;DR

Abstract

Efficient Stackelberg Strategies for Finitely Repeated Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)