A Riemannian Alternating Descent Ascent Algorithmic Framework for Nonconvex-Linear Minimax Problems on Riemannian Manifolds
Meng Xu, Bo Jiang, Ya-Feng Liu, Anthony Man-Cho So
TL;DR
We address nonconvex-linear minimax problems on Riemannian manifolds and introduce the Riemannian alternating descent ascent (RADA) framework, which leverages a differentiable surrogate $\Phi_k$ to capture the best response in $y$ while updating $x$ via a descent subproblem. Two practical single-loop instantiations, $\text{RADA-PGD}$ and $\text{RADA-RGD}$, are analyzed, and it is shown that both achieve $\mathcal{O}(\varepsilon^{-3})$ iterations to obtain an $\varepsilon$-RGS or an $\varepsilon$-ROS point, matching the best-known rates for this class. The paper also clarifies connections to RALM and RADMM, explaining efficiency advantages, and provides extensive numerical results on SPCA, FPCA, and SSC demonstrating superior performance. These results offer a principled, scalable approach to Riemannian minimax problems with potential extensions to stochastic and broader nonsmooth settings.
Abstract
In this paper, we consider a class of nonconvex-linear minimax problems on Riemannian manifolds, which find wide applications in machine learning and signal processing. For solving this class of problems, we develop a flexible Riemannian alternating descent ascent (RADA) algorithmic framework. Within this framework, we propose two easy-to-implement yet efficient algorithms that alternately perform one or multiple projected/Riemannian gradient descent steps and a proximal gradient ascent step at each iteration. We show that the proposed RADA algorithmic framework can find both an $\varepsilon$-Riemannian-game-stationary point and an $\varepsilon$-Riemannian-optimization-stationary point within $\mathcal{O}(\varepsilon^{-3})$ iterations, achieving the best-known iteration complexity. We also reveal intriguing similarities and differences between the algorithms developed within our proposed framework and existing algorithms, thus providing important insights into the improved efficiency of the former. Lastly, we present numerical results on sparse principal component analysis (PCA), fair PCA, and sparse spectral clustering to demonstrate the superior performance of the proposed algorithms.
