Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

Yang Cai; Michael I. Jordan; Tianyi Lin; Argyris Oikonomou; Emmanouil-Vasileios Vlatakis-Gkaragkounis

Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

Yang Cai, Michael I. Jordan, Tianyi Lin, Argyris Oikonomou, Emmanouil-Vasileios Vlatakis-Gkaragkounis

TL;DR

The paper addresses equilibrium computation on Riemannian manifolds by introducing geodesically strongly monotone games and proving that Riemannian gradient descent (RGD) attains last-iterate linear convergence in a geometry-agnostic manner, independent of sectional curvature. It extends these results to stochastic (SRGD) and fully adaptive (FARGD) variants, establishing geometry-agnostic sample complexity and convergence rates with explicit dependence on the condition number $oldsymbol{κ}=ℓ/μ$. The key technical advance is a generalized descent lemma that avoids curvature-dependent tools, enabling linear convergence without assuming unique geodesics and extending to local regions where geodesic monotonicity holds. Empirically, RPCA on SPD and sphere manifolds demonstrates that the last-iterate solutions converge linearly and robustly under varying problem scales and adaptivity, underscoring the practical relevance of the approach for manifold-valued ML tasks.

Abstract

Equilibrium computation on Riemannian manifolds provides a unifying framework for numerous problems in machine learning and data analytics. One of the simplest yet most fundamental methods is Riemannian gradient descent (RGD). While its Euclidean counterpart has been extensively studied, it remains unclear how the manifold curvature affects RGD in game-theoretic settings. This paper addresses this gap by establishing new convergence results for \textit{geodesic strongly monotone} games. Our key result shows that RGD attains last-iterate linear convergence in a \textit{geometry-agnostic} fashion, a key property for applications in machine learning. We extend this guarantee to stochastic and adaptive variants -- SRGD and FARGD -- and establish that: (i) the sample complexity of SRGD is geometry-agnostic and optimal with respect to noise; (ii) FARGD matches the convergence rate of its non-adaptive counterpart up to constant factors, while avoiding reliance on the condition number. Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings, underscoring the surprising power of RGD -- despite its simplicity -- in solving a wide spectrum of machine learning problems.

Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

TL;DR

. The key technical advance is a generalized descent lemma that avoids curvature-dependent tools, enabling linear convergence without assuming unique geodesics and extending to local regions where geodesic monotonicity holds. Empirically, RPCA on SPD and sphere manifolds demonstrates that the last-iterate solutions converge linearly and robustly under varying problem scales and adaptivity, underscoring the practical relevance of the approach for manifold-valued ML tasks.

Abstract

Paper Structure (19 sections, 9 theorems, 69 equations, 4 figures, 1 algorithm)

This paper contains 19 sections, 9 theorems, 69 equations, 4 figures, 1 algorithm.

Introduction
Related works
Contributions
Organization
Preliminaries
Geodesically Monotone Games
Applications
Geometry-Agnostic Convergence Rates
Deterministic and Stochastic Riemannian Gradient Descent
Fully Adaptive Riemannian Gradient Descent
Experiments
Experimental setup
Experimental results
Concluding Remarks
Additional Related Work
...and 4 more sections

Key Result

Lemma 3.1

The joint action profile $x^\star \in \mathcal{M}$ is a Nash equilibrium of a smooth Riemannian game if $\|F(x^\star)\|_{x^\star}= 0$.

Figures (4)

Figure 1: Illustrations of tangent spaces, parallel transport and exponential maps.
Figure 2: Comparison of last and average iterates for RGD with $d \in \{25, 50, 100\}$ when $(n, \alpha)=(40, 1.0)$ (above) and $(n, \alpha)=(40, 2.0)$ (bottom). The horizontal and vertical axes represent the number of data passes and the norm of Riemannian gradient.
Figure 3: Comparison of RGD, SRGD and FARGD (last iterate) with $d \in \{25, 50\}$ when $(n, \alpha)=(200, 1.0)$. The horizontal and vertical axes represent the number of data passes and the norm of Riemannian gradient.
Figure 4: Comparison of RGD using different stepsize choices with $d \in \{25, 50\}$ when $(n, \alpha)=(40, 1.0)$. The horizontal and vertical axes represent the number of data passes and the norm of Riemannian gradient.

Theorems & Definitions (25)

Definition 3.1
Lemma 3.1
Definition 3.2
Lemma 3.2
Theorem 3.3
Lemma 3.4
Remark 3.5
Example 3.1
Example 3.2
Example 3.3
...and 15 more

Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

TL;DR

Abstract

Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (25)