Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

Hongye Wang; Chang He; Bo Jiang

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

Hongye Wang, Chang He, Bo Jiang

TL;DR

This paper tackles stochastic minimax optimization on Riemannian manifolds by introducing adaptive single-loop algorithms that eliminate reliance on problem-parameter knowledge. The deterministic method RAGDA achieves $\mathcal{O}(\epsilon^{-2})$ iterations to an $\epsilon$-stationary point, while the stochastic counterpart RSAGDA attains $\mathcal{O}(\epsilon^{-6})$ (improvable to $\mathcal{O}(\epsilon^{-4})$ with second-order smoothness) under mild assumptions. The authors provide detailed convergence analyses leveraging accumulated gradient norms, retractions, and geodesic concavity properties, along with improvements under second-order smoothness and relaxed conditions. Empirical results on regularized robust MLE and robust neural network training with orthonormal weights corroborate the effectiveness of the adaptive approach and its practical potential for parameter-free Riemannian minimax optimization.

Abstract

Stochastic minimax optimization on Riemannian manifolds has recently attracted significant attention due to its broad range of applications, such as robust training of neural networks and robust maximum likelihood estimation. Existing optimization methods for these problems typically require selecting stepsizes based on prior knowledge of specific problem parameters, such as Lipschitz-type constants and (geodesic) strong concavity constants. Unfortunately, these parameters are often unknown in practice. To overcome this issue, we develop single-loop adaptive methods that automatically adjust stepsizes using cumulative Riemannian (stochastic) gradient norms. We first propose a deterministic single-loop Riemannian adaptive gradient descent ascent method and show that it attains an $ε$-stationary point within $O(ε^{-2})$ iterations. This deterministic method is of independent interest and lays the foundation for our subsequent stochastic method. In particular, we propose the Riemannian stochastic adaptive gradient descent ascent method, which finds an $ε$-stationary point in $O(ε^{-6})$ iterations. Under additional second-order smoothness, this iteration complexity is further improved to $O(ε^{-4})$, which even outperforms the corresponding complexity result in Euclidean space. Some numerical experiments on real-world applications are conducted, including the regularized robust maximum likelihood estimation problem, and the robust training of neural networks with orthonormal weights. The results are encouraging and demonstrate the effectiveness of adaptivity in practice.

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

TL;DR

iterations to an

-stationary point, while the stochastic counterpart RSAGDA attains

(improvable to

with second-order smoothness) under mild assumptions. The authors provide detailed convergence analyses leveraging accumulated gradient norms, retractions, and geodesic concavity properties, along with improvements under second-order smoothness and relaxed conditions. Empirical results on regularized robust MLE and robust neural network training with orthonormal weights corroborate the effectiveness of the adaptive approach and its practical potential for parameter-free Riemannian minimax optimization.

Abstract

-stationary point within

iterations. This deterministic method is of independent interest and lays the foundation for our subsequent stochastic method. In particular, we propose the Riemannian stochastic adaptive gradient descent ascent method, which finds an

-stationary point in

iterations. Under additional second-order smoothness, this iteration complexity is further improved to

, which even outperforms the corresponding complexity result in Euclidean space. Some numerical experiments on real-world applications are conducted, including the regularized robust maximum likelihood estimation problem, and the robust training of neural networks with orthonormal weights. The results are encouraging and demonstrate the effectiveness of adaptivity in practice.

Paper Structure (22 sections, 25 theorems, 135 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 25 theorems, 135 equations, 4 figures, 4 tables, 2 algorithms.

Introduction
Motivating examples
Robust training of neural networks.
Robust Maximum Likelihood Estimation.
Related works
Main contributions
Preliminaries: Riemannian geometry
Riemannian adaptive gradient descent ascent
Algorithm design
Convergence analysis
Riemannian stochastic adaptive gradient descent ascent
Riemannian stochastic adaptive gradient descent ascent method: RSAGDA
Convergence analysis
Improved convergence analysis
Numerical experiments
...and 7 more sections

Key Result

Lemma 1

Under Assumptions ass.l-smooth, ass.g-convex, ass.optimal exist and stationary condition hold, ass.manifold, and ass.retraction, for any $T \ge t_0 + 1$, where $t_0$ denotes the first iteration such that $(v_{t_0+1}^y)^\beta > c_1$, where $c_1 := \max \left\{ \frac{4 \eta^y \mu L_1}{\mu + L_1},\; \e Here, $c_2 = 4(\mu + L_1) \left( \frac{1}{\mu^2} + \frac{\bar{c}\eta^y}{(v_{t_0}^y)^\beta} \right)

Figures (4)

Figure 1: Experiments on Regularized Robust maximum likelihood estimation \ref{['eq.regularize like']}.
Figure 2: Results of robust training of neural networks with orthonormal weights on the MNIST dataset.
Figure 3: Results of robust training of neural networks with orthonormal weights on the FashionMNIST dataset.
Figure 4: Results of robust training of neural networks with orthonormal weights on the CIFAR10 dataset.

Theorems & Definitions (49)

Definition 1: $\epsilon$-stationary point
Lemma 1
Lemma 2
Lemma 3
Theorem 1
Remark 1
Lemma 4
Lemma 5
Lemma 6
Theorem 2
...and 39 more

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

TL;DR

Abstract

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (49)