Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization
Xia Jiang, Linglingzhi Zhu, Taoli Zheng, Anthony Man-Cho So
TL;DR
This work tackles nonconvex-concave finite-sum minimax problems by introducing two single-loop variance-reduced stochastic gradient methods. The PVR-SGDA algorithm uses a probabilistic full-gradient update together with Moreau-Yosida smoothing to achieve $\mathcal{O}(\epsilon^{-4})$ iteration complexity, improving over existing stochastic rates. To eliminate full gradient computations, the ZeroSARAH-SGDA variant employs auxiliary gradient trackers and retains a comparable $\mathcal{O}(\epsilon^{-4})$ iteration complexity with $\mathcal{O}(\sqrt{n}\epsilon^{-4})$ gradient calls, at the cost of extra memory. Numerical results on robust logistic regression and data poisoning demonstrate faster convergence and favorable gradient-efficiency trade-offs, validating the practical effectiveness of the proposed methods for large-scale NC-C minimax problems.
Abstract
Nonconvex-concave (NC-C) finite-sum minimax problems have wide applications in signal processing and machine learning tasks. Conventional stochastic gradient algorithms, which rely on uniform sampling for gradient estimation, often suffer from slow convergence rates and require bounded variance assumptions. While variance reduction techniques can significantly improve the convergence of stochastic algorithms, the inherent nonsmooth nature of NC-C problems makes it challenging to design effective variance reduction techniques. To address this challenge, we develop a novel probabilistic variance reduction scheme and propose a single-loop stochastic gradient algorithm called the probabilistic variance-reduced smoothed gradient descent-ascent (PVR-SGDA) algorithm. The proposed PVR-SGDA algorithm achieves an iteration complexity of $\mathcal{O}(ε^{-4})$, surpassing the best-known rates of stochastic algorithms for NC-C minimax problems and matching the performance of state-of-the-art deterministic algorithms. Furthermore, to completely eliminate the need for full gradient computation and reduce the gradient complexity, we explore another variance reduction technique with auxiliary gradient trackers and propose a smoothed gradient descent-ascent algorithm without full gradient calculation, called ZeroSARAH-SGDA, for NC-C problems. The ZeroSARAH-SGDA algorithm achieves a comparable iteration complexity to PVR-SGDA, while reducing the gradient oracle calls at each iteration. Finally, we demonstrate the effectiveness of the proposed two algorithms through numerical simulations.
