Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

Sixin Zhang

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

Sixin Zhang

TL;DR

This work analyzes local convergence of deterministic simultaneous min–max methods for zero-sum differential games on Riemannian manifolds by defining differential Stackelberg equilibrium (DSE) and differential Nash equilibrium (DNE). It establishes linear convergence rates for the two-time-scale algorithm $\tau$-GDA using an Ostrowski-type fixed-point framework and intrinsic Jacobian analysis, with sharp conditions on the learning-rate ratio $\tau$ and step size $\gamma$. To mitigate rotational dynamics near equilibria, it introduces $\tau$-SGA, and, in an asymptotic regime as $\tau\to\infty$, demonstrates potentially faster convergence to DSE via a modified linear operator $\mathbf{M}_s$; a broader range of $\tau$ can be effective when cross-gradient terms are favorable. The theory is then applied to orthogonal Wasserstein GANs, where the discriminator lies on a Stiefel manifold, and numerical experiments on Gaussian data and MNIST/Fashion-MNIST benchmarks show that $\tau$-SGA can offer improved stability and faster convergence in practice, guiding GAN training under manifold constraints.

Abstract

We study min-max algorithms to solve zero-sum differential games on Riemannian manifold. Based on the notions of differential Stackelberg equilibrium and differential Nash equilibrium on Riemannian manifold, we analyze the local convergence of two representative deterministic simultaneous algorithms $τ$-GDA and $τ$-SGA to such equilibria. Sufficient conditions are obtained to establish the linear convergence rate of $τ$-GDA based on the Ostrowski theorem on manifold and spectral analysis. To avoid strong rotational dynamics in $τ$-GDA, $τ$-SGA is extended from the symplectic gradient-adjustment method in Euclidean space. We analyze an asymptotic approximation of $τ$-SGA when the learning rate ratio $τ$ is big. In some cases, it can achieve a faster convergence rate to differential Stackelberg equilibrium compared to $τ$-GDA. We show numerically how the insights obtained from the convergence analysis may improve the training of orthogonal Wasserstein GANs using stochastic $τ$-GDA and $τ$-SGA on simple benchmarks.

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

TL;DR

-GDA using an Ostrowski-type fixed-point framework and intrinsic Jacobian analysis, with sharp conditions on the learning-rate ratio

and step size

. To mitigate rotational dynamics near equilibria, it introduces

-SGA, and, in an asymptotic regime as

, demonstrates potentially faster convergence to DSE via a modified linear operator

; a broader range of

can be effective when cross-gradient terms are favorable. The theory is then applied to orthogonal Wasserstein GANs, where the discriminator lies on a Stiefel manifold, and numerical experiments on Gaussian data and MNIST/Fashion-MNIST benchmarks show that

-SGA can offer improved stability and faster convergence in practice, guiding GAN training under manifold constraints.

Abstract

-GDA and

-SGA to such equilibria. Sufficient conditions are obtained to establish the linear convergence rate of

-GDA based on the Ostrowski theorem on manifold and spectral analysis. To avoid strong rotational dynamics in

-GDA,

-SGA is extended from the symplectic gradient-adjustment method in Euclidean space. We analyze an asymptotic approximation of

-SGA when the learning rate ratio

is big. In some cases, it can achieve a faster convergence rate to differential Stackelberg equilibrium compared to

-GDA. We show numerically how the insights obtained from the convergence analysis may improve the training of orthogonal Wasserstein GANs using stochastic

-GDA and

-SGA on simple benchmarks.

Paper Structure (77 sections, 14 theorems, 105 equations, 12 figures, 5 tables)

This paper contains 77 sections, 14 theorems, 105 equations, 12 figures, 5 tables.

Introduction
Differential equilibrium on Riemannian manifold
Differential Stackelberg equilibrium (DSE)
Differential Nash equilibrium (DNE) and examples
Example 1: DSE
Example 2: DSE
Example 3: DNE
Simultaneous Min-max algorithms for differential equilibrium
Local convergence of deterministic simultaneous algorithms
Simultaneous gradient-descent-ascent algorithm ($\tau$-GDA)
$\tau$-GDA algorithm
Local convergence of deterministic $\tau$-GDA
Symplectic gradient-adjustment method ($\tau$-SGA)
$\tau$-SGA algorithm
Local convergence of deterministic $\tau$-SGA to DSE: asymptotic analysis
...and 62 more sections

Key Result

Proposition 2.1

Assume $b \not \in \hbox{Range}(A)$, $\hbox{Ker}(A) = \{ 0 \}$. Let $x^\ast = A^{+} b$, $y ^\ast = \frac{ A x^\ast - b }{ \| A x^\ast - b \|}$, then $(x^\ast, y ^\ast)$ is a DSE of the $f$ in Example 1.

Figures (12)

Figure 1:
Figure 2:
Figure 3:
Figure 5: $f(x(t), y (t))$
Figure 6: angle$(x(t), y (t))$
...and 7 more figures

Theorems & Definitions (22)

Definition 2.1
Definition 2.2
Proposition 2.1
Proposition 2.2
Proposition 2.3
Definition 3.1: Locally convergent with (linear) rate $\rho \in (0,1)$
Theorem 3.1: Ostrowski Theorem on manifold
Theorem 3.2
Theorem 3.3: jin2020localzhang2022near
Proposition 3.1
...and 12 more

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

TL;DR

Abstract

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (22)