Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
Haoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed
TL;DR
This work tackles stochastic minimax optimization where the minimization variable is nonconvex and the maximization variable is strongly concave or satisfies the Polyak–Lojasiewicz condition. It introduces two bias-corrected momentum methods (HCMM-1 and HCMM-2) that leverage Hessian-vector products to form more accurate momentum estimates, using a two-time-scale gradient descent-ascent scheme and cross-variable Hessian terms. Under Lipschitz Hessian assumptions, they establish $O(\varepsilon^{-3})$ iteration complexity with $O(1)$ batch size, and demonstrate convergence for both nonconvex–strongly-concave and nonconvex–PL settings, backed by a rigorous potential-function analysis. Empirical results on distributionally robust logistic regression with real datasets show that HCMM variants outperform prior bias-corrected momentum methods in both speed and robustness, with HCMM-2 offering faster convergence and HCMM-1 providing greater resilience to outliers.
Abstract
Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of $\mathcal{O}(\varepsilon^{-3})$ for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.
