Table of Contents
Fetching ...

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

Sixin Zhang

TL;DR

This work analyzes local convergence of deterministic simultaneous min–max methods for zero-sum differential games on Riemannian manifolds by defining differential Stackelberg equilibrium (DSE) and differential Nash equilibrium (DNE). It establishes linear convergence rates for the two-time-scale algorithm $\tau$-GDA using an Ostrowski-type fixed-point framework and intrinsic Jacobian analysis, with sharp conditions on the learning-rate ratio $\tau$ and step size $\gamma$. To mitigate rotational dynamics near equilibria, it introduces $\tau$-SGA, and, in an asymptotic regime as $\tau\to\infty$, demonstrates potentially faster convergence to DSE via a modified linear operator $\mathbf{M}_s$; a broader range of $\tau$ can be effective when cross-gradient terms are favorable. The theory is then applied to orthogonal Wasserstein GANs, where the discriminator lies on a Stiefel manifold, and numerical experiments on Gaussian data and MNIST/Fashion-MNIST benchmarks show that $\tau$-SGA can offer improved stability and faster convergence in practice, guiding GAN training under manifold constraints.

Abstract

We study min-max algorithms to solve zero-sum differential games on Riemannian manifold. Based on the notions of differential Stackelberg equilibrium and differential Nash equilibrium on Riemannian manifold, we analyze the local convergence of two representative deterministic simultaneous algorithms $τ$-GDA and $τ$-SGA to such equilibria. Sufficient conditions are obtained to establish the linear convergence rate of $τ$-GDA based on the Ostrowski theorem on manifold and spectral analysis. To avoid strong rotational dynamics in $τ$-GDA, $τ$-SGA is extended from the symplectic gradient-adjustment method in Euclidean space. We analyze an asymptotic approximation of $τ$-SGA when the learning rate ratio $τ$ is big. In some cases, it can achieve a faster convergence rate to differential Stackelberg equilibrium compared to $τ$-GDA. We show numerically how the insights obtained from the convergence analysis may improve the training of orthogonal Wasserstein GANs using stochastic $τ$-GDA and $τ$-SGA on simple benchmarks.

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

TL;DR

This work analyzes local convergence of deterministic simultaneous min–max methods for zero-sum differential games on Riemannian manifolds by defining differential Stackelberg equilibrium (DSE) and differential Nash equilibrium (DNE). It establishes linear convergence rates for the two-time-scale algorithm -GDA using an Ostrowski-type fixed-point framework and intrinsic Jacobian analysis, with sharp conditions on the learning-rate ratio and step size . To mitigate rotational dynamics near equilibria, it introduces -SGA, and, in an asymptotic regime as , demonstrates potentially faster convergence to DSE via a modified linear operator ; a broader range of can be effective when cross-gradient terms are favorable. The theory is then applied to orthogonal Wasserstein GANs, where the discriminator lies on a Stiefel manifold, and numerical experiments on Gaussian data and MNIST/Fashion-MNIST benchmarks show that -SGA can offer improved stability and faster convergence in practice, guiding GAN training under manifold constraints.

Abstract

We study min-max algorithms to solve zero-sum differential games on Riemannian manifold. Based on the notions of differential Stackelberg equilibrium and differential Nash equilibrium on Riemannian manifold, we analyze the local convergence of two representative deterministic simultaneous algorithms -GDA and -SGA to such equilibria. Sufficient conditions are obtained to establish the linear convergence rate of -GDA based on the Ostrowski theorem on manifold and spectral analysis. To avoid strong rotational dynamics in -GDA, -SGA is extended from the symplectic gradient-adjustment method in Euclidean space. We analyze an asymptotic approximation of -SGA when the learning rate ratio is big. In some cases, it can achieve a faster convergence rate to differential Stackelberg equilibrium compared to -GDA. We show numerically how the insights obtained from the convergence analysis may improve the training of orthogonal Wasserstein GANs using stochastic -GDA and -SGA on simple benchmarks.
Paper Structure (77 sections, 14 theorems, 105 equations, 12 figures, 5 tables)

This paper contains 77 sections, 14 theorems, 105 equations, 12 figures, 5 tables.

Key Result

Proposition 2.1

Assume $b \not \in \hbox{Range}(A)$, $\hbox{Ker}(A) = \{ 0 \}$. Let $x^\ast = A^{+} b$, $y ^\ast = \frac{ A x^\ast - b }{ \| A x^\ast - b \|}$, then $(x^\ast, y ^\ast)$ is a DSE of the $f$ in Example 1.

Figures (12)

  • Figure 1:
  • Figure 2:
  • Figure 3:
  • Figure 5: $f(x(t), y (t))$
  • Figure 6: angle$(x(t), y (t))$
  • ...and 7 more figures

Theorems & Definitions (22)

  • Definition 2.1
  • Definition 2.2
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Definition 3.1: Locally convergent with (linear) rate $\rho \in (0,1)$
  • Theorem 3.1: Ostrowski Theorem on manifold
  • Theorem 3.2
  • Theorem 3.3: jin2020localzhang2022near
  • Proposition 3.1
  • ...and 12 more