Table of Contents
Fetching ...

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

Hongye Wang, Chang He, Bo Jiang

TL;DR

This paper tackles stochastic minimax optimization on Riemannian manifolds by introducing adaptive single-loop algorithms that eliminate reliance on problem-parameter knowledge. The deterministic method RAGDA achieves $\mathcal{O}(\epsilon^{-2})$ iterations to an $\epsilon$-stationary point, while the stochastic counterpart RSAGDA attains $\mathcal{O}(\epsilon^{-6})$ (improvable to $\mathcal{O}(\epsilon^{-4})$ with second-order smoothness) under mild assumptions. The authors provide detailed convergence analyses leveraging accumulated gradient norms, retractions, and geodesic concavity properties, along with improvements under second-order smoothness and relaxed conditions. Empirical results on regularized robust MLE and robust neural network training with orthonormal weights corroborate the effectiveness of the adaptive approach and its practical potential for parameter-free Riemannian minimax optimization.

Abstract

Stochastic minimax optimization on Riemannian manifolds has recently attracted significant attention due to its broad range of applications, such as robust training of neural networks and robust maximum likelihood estimation. Existing optimization methods for these problems typically require selecting stepsizes based on prior knowledge of specific problem parameters, such as Lipschitz-type constants and (geodesic) strong concavity constants. Unfortunately, these parameters are often unknown in practice. To overcome this issue, we develop single-loop adaptive methods that automatically adjust stepsizes using cumulative Riemannian (stochastic) gradient norms. We first propose a deterministic single-loop Riemannian adaptive gradient descent ascent method and show that it attains an $ε$-stationary point within $O(ε^{-2})$ iterations. This deterministic method is of independent interest and lays the foundation for our subsequent stochastic method. In particular, we propose the Riemannian stochastic adaptive gradient descent ascent method, which finds an $ε$-stationary point in $O(ε^{-6})$ iterations. Under additional second-order smoothness, this iteration complexity is further improved to $O(ε^{-4})$, which even outperforms the corresponding complexity result in Euclidean space. Some numerical experiments on real-world applications are conducted, including the regularized robust maximum likelihood estimation problem, and the robust training of neural networks with orthonormal weights. The results are encouraging and demonstrate the effectiveness of adaptivity in practice.

Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds

TL;DR

This paper tackles stochastic minimax optimization on Riemannian manifolds by introducing adaptive single-loop algorithms that eliminate reliance on problem-parameter knowledge. The deterministic method RAGDA achieves iterations to an -stationary point, while the stochastic counterpart RSAGDA attains (improvable to with second-order smoothness) under mild assumptions. The authors provide detailed convergence analyses leveraging accumulated gradient norms, retractions, and geodesic concavity properties, along with improvements under second-order smoothness and relaxed conditions. Empirical results on regularized robust MLE and robust neural network training with orthonormal weights corroborate the effectiveness of the adaptive approach and its practical potential for parameter-free Riemannian minimax optimization.

Abstract

Stochastic minimax optimization on Riemannian manifolds has recently attracted significant attention due to its broad range of applications, such as robust training of neural networks and robust maximum likelihood estimation. Existing optimization methods for these problems typically require selecting stepsizes based on prior knowledge of specific problem parameters, such as Lipschitz-type constants and (geodesic) strong concavity constants. Unfortunately, these parameters are often unknown in practice. To overcome this issue, we develop single-loop adaptive methods that automatically adjust stepsizes using cumulative Riemannian (stochastic) gradient norms. We first propose a deterministic single-loop Riemannian adaptive gradient descent ascent method and show that it attains an -stationary point within iterations. This deterministic method is of independent interest and lays the foundation for our subsequent stochastic method. In particular, we propose the Riemannian stochastic adaptive gradient descent ascent method, which finds an -stationary point in iterations. Under additional second-order smoothness, this iteration complexity is further improved to , which even outperforms the corresponding complexity result in Euclidean space. Some numerical experiments on real-world applications are conducted, including the regularized robust maximum likelihood estimation problem, and the robust training of neural networks with orthonormal weights. The results are encouraging and demonstrate the effectiveness of adaptivity in practice.
Paper Structure (22 sections, 25 theorems, 135 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 25 theorems, 135 equations, 4 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

Under Assumptions ass.l-smooth, ass.g-convex, ass.optimal exist and stationary condition hold, ass.manifold, and ass.retraction, for any $T \ge t_0 + 1$, where $t_0$ denotes the first iteration such that $(v_{t_0+1}^y)^\beta > c_1$, where $c_1 := \max \left\{ \frac{4 \eta^y \mu L_1}{\mu + L_1},\; \e Here, $c_2 = 4(\mu + L_1) \left( \frac{1}{\mu^2} + \frac{\bar{c}\eta^y}{(v_{t_0}^y)^\beta} \right)

Figures (4)

  • Figure 1: Experiments on Regularized Robust maximum likelihood estimation \ref{['eq.regularize like']}.
  • Figure 2: Results of robust training of neural networks with orthonormal weights on the MNIST dataset.
  • Figure 3: Results of robust training of neural networks with orthonormal weights on the FashionMNIST dataset.
  • Figure 4: Results of robust training of neural networks with orthonormal weights on the CIFAR10 dataset.

Theorems & Definitions (49)

  • Definition 1: $\epsilon$-stationary point
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Remark 1
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Theorem 2
  • ...and 39 more