Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization

Xia Jiang; Linglingzhi Zhu; Taoli Zheng; Anthony Man-Cho So

Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization

Xia Jiang, Linglingzhi Zhu, Taoli Zheng, Anthony Man-Cho So

TL;DR

This work tackles nonconvex-concave finite-sum minimax problems by introducing two single-loop variance-reduced stochastic gradient methods. The PVR-SGDA algorithm uses a probabilistic full-gradient update together with Moreau-Yosida smoothing to achieve $\mathcal{O}(\epsilon^{-4})$ iteration complexity, improving over existing stochastic rates. To eliminate full gradient computations, the ZeroSARAH-SGDA variant employs auxiliary gradient trackers and retains a comparable $\mathcal{O}(\epsilon^{-4})$ iteration complexity with $\mathcal{O}(\sqrt{n}\epsilon^{-4})$ gradient calls, at the cost of extra memory. Numerical results on robust logistic regression and data poisoning demonstrate faster convergence and favorable gradient-efficiency trade-offs, validating the practical effectiveness of the proposed methods for large-scale NC-C minimax problems.

Abstract

Nonconvex-concave (NC-C) finite-sum minimax problems have wide applications in signal processing and machine learning tasks. Conventional stochastic gradient algorithms, which rely on uniform sampling for gradient estimation, often suffer from slow convergence rates and require bounded variance assumptions. While variance reduction techniques can significantly improve the convergence of stochastic algorithms, the inherent nonsmooth nature of NC-C problems makes it challenging to design effective variance reduction techniques. To address this challenge, we develop a novel probabilistic variance reduction scheme and propose a single-loop stochastic gradient algorithm called the probabilistic variance-reduced smoothed gradient descent-ascent (PVR-SGDA) algorithm. The proposed PVR-SGDA algorithm achieves an iteration complexity of $\mathcal{O}(ε^{-4})$, surpassing the best-known rates of stochastic algorithms for NC-C minimax problems and matching the performance of state-of-the-art deterministic algorithms. Furthermore, to completely eliminate the need for full gradient computation and reduce the gradient complexity, we explore another variance reduction technique with auxiliary gradient trackers and propose a smoothed gradient descent-ascent algorithm without full gradient calculation, called ZeroSARAH-SGDA, for NC-C problems. The ZeroSARAH-SGDA algorithm achieves a comparable iteration complexity to PVR-SGDA, while reducing the gradient oracle calls at each iteration. Finally, we demonstrate the effectiveness of the proposed two algorithms through numerical simulations.

Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization

TL;DR

iteration complexity, improving over existing stochastic rates. To eliminate full gradient computations, the ZeroSARAH-SGDA variant employs auxiliary gradient trackers and retains a comparable

iteration complexity with

gradient calls, at the cost of extra memory. Numerical results on robust logistic regression and data poisoning demonstrate faster convergence and favorable gradient-efficiency trade-offs, validating the practical effectiveness of the proposed methods for large-scale NC-C minimax problems.

Abstract

, surpassing the best-known rates of stochastic algorithms for NC-C minimax problems and matching the performance of state-of-the-art deterministic algorithms. Furthermore, to completely eliminate the need for full gradient computation and reduce the gradient complexity, we explore another variance reduction technique with auxiliary gradient trackers and propose a smoothed gradient descent-ascent algorithm without full gradient calculation, called ZeroSARAH-SGDA, for NC-C problems. The ZeroSARAH-SGDA algorithm achieves a comparable iteration complexity to PVR-SGDA, while reducing the gradient oracle calls at each iteration. Finally, we demonstrate the effectiveness of the proposed two algorithms through numerical simulations.

Paper Structure (20 sections, 16 theorems, 59 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 16 theorems, 59 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Motivating Applications
Distributed optimization over multi-agent networks
Power control and transceiver design problem
Problem description and algorithm design
Convergence analysis
Descent Property of $\Phi_t$
Convergence Theorem
Variance-Reduced Smoothed Gradient Descent-Ascent without Full Gradient
Convergence analysis
Numerical Results
Robust Logistic Regression
Data poisoning
Conclusion
Useful Lemmas
...and 5 more sections

Key Result

Lemma 3.1

The function $K(\cdot,z;y)$ is strongly convex with $r-L$ and $\nabla_x K(\cdot,z;y)$ is Lipschitz continuous with constant $L+r$.

Figures (2)

Figure 1: (a) Convergence of PVR-SGDA algorithm with different $p$. (b) Performance for different algorithms.
Figure 2: Testing accuracy with respect to gradient oracle calls in data poisoning.

Theorems & Definitions (31)

Remark 3.1
Lemma 3.1
Lemma 4.1
Lemma 4.2
Proposition 4.1
Lemma 4.3: c.f. li2023nonsmooth
Definition 4.1
Lemma 4.4
Theorem 4.1
proof
...and 21 more

Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization

TL;DR

Abstract

Efficient Single-Loop Stochastic Algorithms for Nonconvex-Concave Minimax Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)