Table of Contents
Fetching ...

Convergence of Learning Dynamics in Stackelberg Games

Tanner Fiez, Benjamin Chasnov, Lillian J. Ratliff

TL;DR

The paper analyzes the convergence of gradient-based learning dynamics in Stackelberg games with a leader–follower structure and continuous actions. It introduces a differential Stackelberg equilibrium concept, proves key connections between Nash and Stackelberg equilibria in zero-sum settings, and develops both a leader-update with a best-response follower and a two-timescale variant with gradient-play followers, accompanied by almost-sure and finite-time convergence guarantees. The framework is linked to GAN training and adversarial learning, showing that Stackelberg dynamics can mitigate cycling and lead to robust equilibria, including non-Nash Stackelberg points that perform well in practice. Extensive numerical experiments on Stackelberg duopoly, torus-location games, and MNIST-based GANs validate the theory and highlight the practical benefits of hierarchical learning in ML contexts.

Abstract

This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gradient descent are Stackelberg equilibria in zero-sum games. Moreover, we show that the only stable critical points of the Stackelberg gradient dynamics are Stackelberg equilibria in zero-sum games. Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games. As a result, the learning rule provably converges to a Stackelberg equilibria given an initialization in the region of attraction of a stable critical point. We then consider a follower employing a gradient-play update rule instead of a best response strategy and propose a two-timescale algorithm with similar asymptotic convergence guarantees. For this algorithm, we also provide finite-time high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games. Finally, we present extensive numerical results that validate our theory, provide insights into the optimization landscape of generative adversarial networks, and demonstrate that the learning dynamics we propose can effectively train generative adversarial networks.

Convergence of Learning Dynamics in Stackelberg Games

TL;DR

The paper analyzes the convergence of gradient-based learning dynamics in Stackelberg games with a leader–follower structure and continuous actions. It introduces a differential Stackelberg equilibrium concept, proves key connections between Nash and Stackelberg equilibria in zero-sum settings, and develops both a leader-update with a best-response follower and a two-timescale variant with gradient-play followers, accompanied by almost-sure and finite-time convergence guarantees. The framework is linked to GAN training and adversarial learning, showing that Stackelberg dynamics can mitigate cycling and lead to robust equilibria, including non-Nash Stackelberg points that perform well in practice. Extensive numerical experiments on Stackelberg duopoly, torus-location games, and MNIST-based GANs validate the theory and highlight the practical benefits of hierarchical learning in ML contexts.

Abstract

This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gradient descent are Stackelberg equilibria in zero-sum games. Moreover, we show that the only stable critical points of the Stackelberg gradient dynamics are Stackelberg equilibria in zero-sum games. Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games. As a result, the learning rule provably converges to a Stackelberg equilibria given an initialization in the region of attraction of a stable critical point. We then consider a follower employing a gradient-play update rule instead of a best response strategy and propose a two-timescale algorithm with similar asymptotic convergence guarantees. For this algorithm, we also provide finite-time high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games. Finally, we present extensive numerical results that validate our theory, provide insights into the optimization landscape of generative adversarial networks, and demonstrate that the learning dynamics we propose can effectively train generative adversarial networks.

Paper Structure

This paper contains 34 sections, 27 theorems, 101 equations, 7 figures.

Key Result

Proposition 1

Attracting critical points of $\dot{x}=-\omega_{\mathcal{S}}(x)$ in continuous zero-sum games are differential Stackelberg equilibria. That is, given a zero-sum game $(f,-f)$ defined by a sufficiently smooth function $f\in C^q(X, \mathbb{R})$ with $q\geq 2$, any stable critical point $x^\ast$ of the

Figures (7)

  • Figure 1: Simultaneous gradient play is attracted to non-Nash differential Stackelberg equilibria: The game is given by the pair of cost functions $(f,-f)$ where $f$ is defined in \ref{['eq:polygame']} with $a=0.15$ and $b=0.25$. There are two non-Nash attractors of simultaneous gradient play which are also differential Stackelberg equilibria.
  • Figure 2: (a) Firms' Production. Sample learning paths for each firm showing the production evolution and convergence to the Nash equilibrium under the Nash dynamics (i.e., simultaneous gradient-based learning using players' individual gradients with respect to their own choice variable) and convergence to the Stackelberg equilibrium under the Stackelberg dynamics. (b) Firms' Profit. Evolution of each firm's profit under the learning dynamics for both Nash and Stackelberg. Similar convergence characteristics can be observed in (a) and (b). Of note is the improved profit obtained by the leader in the Stackelberg equilibrium compared to the Nash equilibrium.
  • Figure 3: (a-b) Sample learning paths for each player showing the positions and convergence to local Nash equilibria under the Nash dynamics and convergence to local Stackelberg equilibria under the Stackelberg dynamics. The value of player 1's choice variable $\theta_1$ is shown on the horizontal axis and the value of player 2's choice variable $\theta_2$ is shown on the vertical axis. Note that the square depicts the unfolded torus where horizontal edges are equivalent, vertical edges are equivalent, and the corners are all equivalent. The black lines show $D_1f_1$ in (a) and $Df_1$ in (b) where the white lines show $D_2f_2$ in both (a) and (b). (c-d) Position and cost paths for each player for a sampled initial condition under the Nash and Stackelberg dynamics.
  • Figure 4: We estimate the covariance matrix $\Sigma$ with the Stackelberg learning dynamics, where the generator is the leader with choice variable $V\in \mathbb{R}^{m\times m}$ and discriminator is the follower with choice variable $W\in \mathbb{R}^{m\times m}$. Stackelberg learning can more effectively estimate the covariance matrix when compared with simultaneous gradient descent. We demonstrate the convergence for dimensions 3, 9, 25 in (a)--(c), with learning rates $\gamma_{1,k}=0.015(1-10^{-5})^k$, $\gamma_{2, k}=0.015(1-10^{-7})^k$ and regularization $\eta =m/5$. The trajectories of the first element of $W$ and $V$ are plotted over time in (d)--(f). Observe the cycling behavior of simultaneous gradient descent.
  • Figure 5: Convergence to non-Nash Stackelberg equilibria for both simultaneous gradient descent (top row) and Stackelberg learning dynamics (bottom row) in a 2-dimensional mixture of gaussian GAN example. The performance of the generator (player 1) and discriminator (player 2) are plotted in (a)--(b) and (g)--(h). To determine the positive definiteness of the game Jacobian, Schur complement and the individual Hessians, we compute the six smallest real eigenvalues and six largest real eigenvalues for each in (c)-(f) and (i)-(l). We observe that for both updates, the leader's Hessian is non-positive while the Schur complement is positive.
  • ...and 2 more figures

Theorems & Definitions (37)

  • Definition 1: Nash Equilibrium
  • Definition 2: Stackelberg Equilibrium
  • Definition 3: Differential Nash Equilibrium ratliff:2016aa
  • Definition 4: Differential Stackelberg Equilibrium
  • Remark 1
  • Proposition 1
  • Proposition 2
  • Remark 2
  • Example 1: Non-Nash Attractors are Stackelberg.
  • Proposition 3: Necessary conditions
  • ...and 27 more