Convex-Concave Zero-sum Markov Stackelberg Games

Denizalp Goktas; Arjun Prakash; Amy Greenwald

Convex-Concave Zero-sum Markov Stackelberg Games

Denizalp Goktas, Arjun Prakash, Amy Greenwald

TL;DR

This paper develops policy gradient methods that solve zero-sum Markov Stackelberg games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play, and proves that their algorithms converge to Stackelberg equilibrium in polynomial time.

Abstract

Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to Stackelberg equilibrium in polynomial time. We also show that reach-avoid problems are naturally modeled as convex-concave zero-sum Markov Stackelberg games, and that Stackelberg equilibrium policies are more effective than their Nash counterparts in these problems.

Convex-Concave Zero-sum Markov Stackelberg Games

TL;DR

Abstract

Paper Structure (23 sections, 12 theorems, 28 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 12 theorems, 28 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Min-Max Optimization with Coupled Constraints
Policy Gradient in Convex-Concave Zero-Sum Markov Stackelberg Games
Application: Reach-Avoid Problems
Acknowledgments
Preliminaries
Related Work
Related Work
Ommited Algorithm Details
Omitted Proofs
Algorithm for Convex-Concave Min-Max Stackelberg Games
Fixed Learning Rate
Decreasing Learning rate
Strongly-Convex Case.
...and 8 more sections

Key Result

Theorem 3.1

Let $({\mathcal{X}}_{}, {\mathcal{Y}}_{}, \obj, \constr)$ be a convex-concave min-max Stackelberg game for which assum:smooth holds. For any $\varepsilon, \delta \geq 0$, if nested SGDA (resp. saddle-point-oracle SGD) is run with inputs that satisfy for all $t_{}\in \mathbb N_+$, $\learnrate[\outer]

Figures (4)

Figure 1: One run of the reach-avoid game. The protagonist (blue) attempts to reach the target set (green), while the antagonist (red) tries to prevent the protagonist doing so. The dotted line around the antagonist represents the avoid set.
Figure 2: The average Bellman error of the Stackelberg version converges close to zero compared the the average Bellman error of the Nash
Figure 3: Both figures show the an example evaluation run. The Stackelberg policy is able to successfully execute the swerve while the Nash is not.
Figure 4: Both figures show the an example evaluation run with the same obstacle configuration. The Stackelberg variant successfully reaches the goal while the Nash variant avoids the first obstacle but collides into the second.

Theorems & Definitions (20)

Theorem 3.1
Theorem 4.1
Theorem 5.1
Lemma 1: Convex-Concave Assumption
proof
Lemma 2: Alternative Convex-Concave Assumption
Lemma 3: Gradient Approximation Error
proof : Proof of \ref{['lemma:error_bound']}
Theorem D.1
proof : Proof of \ref{['thm:min_max_convergence']}
...and 10 more

Convex-Concave Zero-sum Markov Stackelberg Games

TL;DR

Abstract

Convex-Concave Zero-sum Markov Stackelberg Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (20)