Table of Contents
Fetching ...

Convex-Concave Zero-sum Markov Stackelberg Games

Denizalp Goktas, Arjun Prakash, Amy Greenwald

TL;DR

This paper develops policy gradient methods that solve zero-sum Markov Stackelberg games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play, and proves that their algorithms converge to Stackelberg equilibrium in polynomial time.

Abstract

Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to Stackelberg equilibrium in polynomial time. We also show that reach-avoid problems are naturally modeled as convex-concave zero-sum Markov Stackelberg games, and that Stackelberg equilibrium policies are more effective than their Nash counterparts in these problems.

Convex-Concave Zero-sum Markov Stackelberg Games

TL;DR

This paper develops policy gradient methods that solve zero-sum Markov Stackelberg games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play, and proves that their algorithms converge to Stackelberg equilibrium in polynomial time.

Abstract

Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to Stackelberg equilibrium in polynomial time. We also show that reach-avoid problems are naturally modeled as convex-concave zero-sum Markov Stackelberg games, and that Stackelberg equilibrium policies are more effective than their Nash counterparts in these problems.
Paper Structure (23 sections, 12 theorems, 28 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 12 theorems, 28 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $({\mathcal{X}}_{}, {\mathcal{Y}}_{}, \obj, \constr)$ be a convex-concave min-max Stackelberg game for which assum:smooth holds. For any $\varepsilon, \delta \geq 0$, if nested SGDA (resp. saddle-point-oracle SGD) is run with inputs that satisfy for all $t_{}\in \mathbb N_+$, $\learnrate[\outer]

Figures (4)

  • Figure 1: One run of the reach-avoid game. The protagonist (blue) attempts to reach the target set (green), while the antagonist (red) tries to prevent the protagonist doing so. The dotted line around the antagonist represents the avoid set.
  • Figure 2: The average Bellman error of the Stackelberg version converges close to zero compared the the average Bellman error of the Nash
  • Figure 3: Both figures show the an example evaluation run. The Stackelberg policy is able to successfully execute the swerve while the Nash is not.
  • Figure 4: Both figures show the an example evaluation run with the same obstacle configuration. The Stackelberg variant successfully reaches the goal while the Nash variant avoids the first obstacle but collides into the second.

Theorems & Definitions (20)

  • Theorem 3.1
  • Theorem 4.1
  • Theorem 5.1
  • Lemma 1: Convex-Concave Assumption
  • proof
  • Lemma 2: Alternative Convex-Concave Assumption
  • Lemma 3: Gradient Approximation Error
  • proof : Proof of \ref{['lemma:error_bound']}
  • Theorem D.1
  • proof : Proof of \ref{['thm:min_max_convergence']}
  • ...and 10 more