Adaptive Single-Loop Methods for Stochastic Minimax Optimization on Riemannian Manifolds
Hongye Wang, Chang He, Bo Jiang
TL;DR
This paper tackles stochastic minimax optimization on Riemannian manifolds by introducing adaptive single-loop algorithms that eliminate reliance on problem-parameter knowledge. The deterministic method RAGDA achieves $\mathcal{O}(\epsilon^{-2})$ iterations to an $\epsilon$-stationary point, while the stochastic counterpart RSAGDA attains $\mathcal{O}(\epsilon^{-6})$ (improvable to $\mathcal{O}(\epsilon^{-4})$ with second-order smoothness) under mild assumptions. The authors provide detailed convergence analyses leveraging accumulated gradient norms, retractions, and geodesic concavity properties, along with improvements under second-order smoothness and relaxed conditions. Empirical results on regularized robust MLE and robust neural network training with orthonormal weights corroborate the effectiveness of the adaptive approach and its practical potential for parameter-free Riemannian minimax optimization.
Abstract
Stochastic minimax optimization on Riemannian manifolds has recently attracted significant attention due to its broad range of applications, such as robust training of neural networks and robust maximum likelihood estimation. Existing optimization methods for these problems typically require selecting stepsizes based on prior knowledge of specific problem parameters, such as Lipschitz-type constants and (geodesic) strong concavity constants. Unfortunately, these parameters are often unknown in practice. To overcome this issue, we develop single-loop adaptive methods that automatically adjust stepsizes using cumulative Riemannian (stochastic) gradient norms. We first propose a deterministic single-loop Riemannian adaptive gradient descent ascent method and show that it attains an $ε$-stationary point within $O(ε^{-2})$ iterations. This deterministic method is of independent interest and lays the foundation for our subsequent stochastic method. In particular, we propose the Riemannian stochastic adaptive gradient descent ascent method, which finds an $ε$-stationary point in $O(ε^{-6})$ iterations. Under additional second-order smoothness, this iteration complexity is further improved to $O(ε^{-4})$, which even outperforms the corresponding complexity result in Euclidean space. Some numerical experiments on real-world applications are conducted, including the regularized robust maximum likelihood estimation problem, and the robust training of neural networks with orthonormal weights. The results are encouraging and demonstrate the effectiveness of adaptivity in practice.
