Table of Contents
Fetching ...

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

Qiushui Xu, Xuan Zhang, Necdet Serhat Aybat, Mert Gürbüzbalaban

TL;DR

To the knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$.

Abstract

We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCC) minimax problems $\min_x \max_y \sum_{i=1}^N g_i(x_i)+f(x,y)-h(y)$, where $h$ and $g_i$ for $i = 1, \ldots, N$ are closed, convex functions, $f$ is $L$-smooth and $μ$-strongly concave in $y$ for some $μ\geq 0$. We consider two scenarios: (i) the deterministic setting where we assume one can compute $\nabla f$ exactly, and (ii) the stochastic setting where we have only access to $\nabla f$ through an unbiased stochastic oracle with a finite variance. While most of the existing methods assume knowledge of the Lipschitz constant $L$, SGDA-B is agnostic to $L$. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, SGDA-B can compute an $ε$-stationary point within $\mathcal{O}(Lκ^2/ε^2)$ and $\mathcal{O}(L^3/ε^4)$ gradient calls when $μ>0$ and $μ=0$, respectively, where $κ=L/μ$. In the stochastic setting, for any $p \in (0, 1)$ and $ε>0$, it can compute an $ε$-stationary point with high probability, which requires $\mathcal{O}(Lκ^3ε^{-4}\log(1/p))$ and $\tilde{\mathcal{O}}(L^4ε^{-7}\log(1/p))$ stochastic oracle calls, with probability at least $1-p$, when $μ>0$ and $μ=0$, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

TL;DR

To the knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to .

Abstract

We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCC) minimax problems , where and for are closed, convex functions, is -smooth and -strongly concave in for some . We consider two scenarios: (i) the deterministic setting where we assume one can compute exactly, and (ii) the stochastic setting where we have only access to through an unbiased stochastic oracle with a finite variance. While most of the existing methods assume knowledge of the Lipschitz constant , SGDA-B is agnostic to . Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, SGDA-B can compute an -stationary point within and gradient calls when and , respectively, where . In the stochastic setting, for any and , it can compute an -stationary point with high probability, which requires and stochastic oracle calls, with probability at least , when and , respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to . We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.
Paper Structure (20 sections, 20 theorems, 102 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 20 sections, 20 theorems, 102 equations, 2 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

For any given $\epsilon>0$, let $M_x,M_y$ be chosen as in line algeq:M of SGDA-B, displayed in Algorithm alg:GDA-B.Then, for any $(\mathbf{x},y)\textcolor{black}{\in\mathop{\bf dom} f\times \mathop{\bf dom} g}$ and any scalar $r>0$, it holds that

Figures (2)

  • Figure 1: Comparison of SGDA-B against other algorithms, GDAlin2020gradient, AGDAboct2020alternating, TiAdali2022tiada, sm-AGDAyang2022faster and VRLMmancino2023variance on synthetic-data for solving \ref{['eq:bilinear-problem']} with $10$ times simulation.
  • Figure 2: Comparison of SGDA-B against other algorithms, GDAlin2020gradient, AGDAboct2020alternating, TiAdali2022tiada, and VRLMmancino2023variance on real data for solving \ref{['eq:dro-problmm']} with $10$ times simulation. "Train error" denotes the fraction of wrong prediction, and "loss" denotes $F(\mathbf{x})=\max_{y\in\mathcal{Y}}\mathcal{L}(\mathbf{x},y)$. One epoch means one complete pass of the data set.

Theorems & Definitions (53)

  • Definition 1
  • Definition 2
  • Remark 1
  • Definition 3
  • Remark 2
  • Lemma 1
  • proof
  • Definition 4
  • Lemma 2
  • proof
  • ...and 43 more