Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

Xia Jiang; Linglingzhi Zhu; Anthony Man-Cho So; Shisheng Cui; Jian Sun

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

Xia Jiang, Linglingzhi Zhu, Anthony Man-Cho So, Shisheng Cui, Jian Sun

TL;DR

This paper proposes a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems and achieves $\epsilon$-stationarity in expectation in $\mathcal{O}(\kappa^2 \epsilon^{-2})$ iterations.

Abstract

In recent years, there has been considerable interest in designing stochastic first-order algorithms to tackle finite-sum smooth minimax problems. To obtain the gradient estimates, one typically relies on the uniform sampling-with-replacement scheme or various sampling-without-replacement (also known as shuffling) schemes. While the former is easier to analyze, the latter often have better empirical performance. In this paper, we propose a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems. We show that the proposed algorithm achieves $ε$-stationarity in expectation in $\mathcal{O}(κ^2 ε^{-2})$ iterations, where $κ$ is the condition number of the problem. This outperforms existing shuffling schemes and matches the complexity of the best-known sampling-with-replacement algorithms. Our proposed algorithm also achieves the same complexity as that of its deterministic counterpart, the two-timescale GDA algorithm. Our numerical experiments demonstrate the superior performance of the proposed algorithm.

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

TL;DR

-stationarity in expectation in

iterations.

Abstract

-stationarity in expectation in

iterations, where

is the condition number of the problem. This outperforms existing shuffling schemes and matches the complexity of the best-known sampling-with-replacement algorithms. Our proposed algorithm also achieves the same complexity as that of its deterministic counterpart, the two-timescale GDA algorithm. Our numerical experiments demonstrate the superior performance of the proposed algorithm.

Paper Structure (11 sections, 6 theorems, 33 equations, 2 figures, 1 algorithm)

This paper contains 11 sections, 6 theorems, 33 equations, 2 figures, 1 algorithm.

Introduction
Problem Setup and Preliminaries
Shuffling Gradient Descent-Ascent with Variance Reduction
Convergence analysis
Basic Descent Properties
Controlling Bias in Shuffling Algorithms
Main Convergence Theorem
Numerical Experiments
Data Poisoning against Logistic Regression
Distributionally Robust Optimization
Conclusion

Key Result

Lemma 2.1

Under Assumption f_assump, the function $\Phi$ is $(l+\kappa l)$-smooth with $\nabla \Phi(x)=\nabla_x f(x, y^*(x))$, where $y^*(x)\in\operatorname{argmax}_{y\in \mathbb{R}^d} f(\cdot,y)$ is a singleton. Also, $y^*(\cdot)$ is $\kappa$-Lipschitz.

Figures (2)

Figure 1: Iterative performance of SGDA, SREDA and Algorithm \ref{['vr_rr']} in data poisoning.
Figure 2: Performance of $\Phi(x_t)$ with respect to the number of gradient oracles for algorithms in robust logistic regression.

Theorems & Definitions (14)

Lemma 2.1: cf. SGDA
Definition 2.1
Remark 3.1
Lemma 4.1
proof
Lemma 4.2
proof
Lemma 4.3
proof
Theorem 4.1
...and 4 more

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

TL;DR

Abstract

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (14)