Table of Contents
Fetching ...

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

Xia Jiang, Linglingzhi Zhu, Anthony Man-Cho So, Shisheng Cui, Jian Sun

TL;DR

This paper proposes a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems and achieves $\epsilon$-stationarity in expectation in $\mathcal{O}(\kappa^2 \epsilon^{-2})$ iterations.

Abstract

In recent years, there has been considerable interest in designing stochastic first-order algorithms to tackle finite-sum smooth minimax problems. To obtain the gradient estimates, one typically relies on the uniform sampling-with-replacement scheme or various sampling-without-replacement (also known as shuffling) schemes. While the former is easier to analyze, the latter often have better empirical performance. In this paper, we propose a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems. We show that the proposed algorithm achieves $ε$-stationarity in expectation in $\mathcal{O}(κ^2 ε^{-2})$ iterations, where $κ$ is the condition number of the problem. This outperforms existing shuffling schemes and matches the complexity of the best-known sampling-with-replacement algorithms. Our proposed algorithm also achieves the same complexity as that of its deterministic counterpart, the two-timescale GDA algorithm. Our numerical experiments demonstrate the superior performance of the proposed algorithm.

Shuffling Gradient Descent-Ascent with Variance Reduction for Nonconvex-Strongly Concave Smooth Minimax Problems

TL;DR

This paper proposes a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems and achieves -stationarity in expectation in iterations.

Abstract

In recent years, there has been considerable interest in designing stochastic first-order algorithms to tackle finite-sum smooth minimax problems. To obtain the gradient estimates, one typically relies on the uniform sampling-with-replacement scheme or various sampling-without-replacement (also known as shuffling) schemes. While the former is easier to analyze, the latter often have better empirical performance. In this paper, we propose a novel single-loop stochastic gradient descent-ascent (GDA) algorithm that employs both shuffling schemes and variance reduction to solve nonconvex-strongly concave smooth minimax problems. We show that the proposed algorithm achieves -stationarity in expectation in iterations, where is the condition number of the problem. This outperforms existing shuffling schemes and matches the complexity of the best-known sampling-with-replacement algorithms. Our proposed algorithm also achieves the same complexity as that of its deterministic counterpart, the two-timescale GDA algorithm. Our numerical experiments demonstrate the superior performance of the proposed algorithm.
Paper Structure (11 sections, 6 theorems, 33 equations, 2 figures, 1 algorithm)

This paper contains 11 sections, 6 theorems, 33 equations, 2 figures, 1 algorithm.

Key Result

Lemma 2.1

Under Assumption f_assump, the function $\Phi$ is $(l+\kappa l)$-smooth with $\nabla \Phi(x)=\nabla_x f(x, y^*(x))$, where $y^*(x)\in\operatorname{argmax}_{y\in \mathbb{R}^d} f(\cdot,y)$ is a singleton. Also, $y^*(\cdot)$ is $\kappa$-Lipschitz.

Figures (2)

  • Figure 1: Iterative performance of SGDA, SREDA and Algorithm \ref{['vr_rr']} in data poisoning.
  • Figure 2: Performance of $\Phi(x_t)$ with respect to the number of gradient oracles for algorithms in robust logistic regression.

Theorems & Definitions (14)

  • Lemma 2.1: cf. SGDA
  • Definition 2.1
  • Remark 3.1
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Theorem 4.1
  • ...and 4 more