A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

Qiushui Xu; Xuan Zhang; Necdet Serhat Aybat; Mert Gürbüzbalaban

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

Qiushui Xu, Xuan Zhang, Necdet Serhat Aybat, Mert Gürbüzbalaban

TL;DR

To the knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$.

Abstract

We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCC) minimax problems $\min_x \max_y \sum_{i=1}^N g_i(x_i)+f(x,y)-h(y)$, where $h$ and $g_i$ for $i = 1, \ldots, N$ are closed, convex functions, $f$ is $L$-smooth and $μ$-strongly concave in $y$ for some $μ\geq 0$. We consider two scenarios: (i) the deterministic setting where we assume one can compute $\nabla f$ exactly, and (ii) the stochastic setting where we have only access to $\nabla f$ through an unbiased stochastic oracle with a finite variance. While most of the existing methods assume knowledge of the Lipschitz constant $L$, SGDA-B is agnostic to $L$. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, SGDA-B can compute an $ε$-stationary point within $\mathcal{O}(Lκ^2/ε^2)$ and $\mathcal{O}(L^3/ε^4)$ gradient calls when $μ>0$ and $μ=0$, respectively, where $κ=L/μ$. In the stochastic setting, for any $p \in (0, 1)$ and $ε>0$, it can compute an $ε$-stationary point with high probability, which requires $\mathcal{O}(Lκ^3ε^{-4}\log(1/p))$ and $\tilde{\mathcal{O}}(L^4ε^{-7}\log(1/p))$ stochastic oracle calls, with probability at least $1-p$, when $μ>0$ and $μ=0$, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

TL;DR

To the knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to

Abstract

We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-(strongly) concave (NCC) minimax problems

, where

and

for

are closed, convex functions,

-smooth and

-strongly concave in

for some

. We consider two scenarios: (i) the deterministic setting where we assume one can compute

exactly, and (ii) the stochastic setting where we have only access to

through an unbiased stochastic oracle with a finite variance. While most of the existing methods assume knowledge of the Lipschitz constant

, SGDA-B is agnostic to

. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, SGDA-B can compute an

-stationary point within

and

gradient calls when

and

, respectively, where

. In the stochastic setting, for any

and

, it can compute an

-stationary point with high probability, which requires

and

stochastic oracle calls, with probability at least

, when

and

, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to

. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.

Paper Structure (20 sections, 20 theorems, 102 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 20 sections, 20 theorems, 102 equations, 2 figures, 2 tables, 3 algorithms.

Introduction
Related work
Assumptions and Algorithmic Framework
Weakly Convex-Strongly Concave (WCSC) Setting
RB-SGDA with Jacobi Updates
A method backtracking the dual step-size: SGDA-B
Weakly Convex-Merely Concave (WCMC) Setting
Numerical Experiments
Parameter settings.
Regularized Bilinear Problem with Synthetic Data.
Distributed Robust Optimization with Neural Network.
Conclusion
SGDA-B with Gauss-Seidel Updates
Derivation of Computational Complexities in Tables \ref{['table_stoc']} and \ref{['table_deter']}
Derivation of complexity in junchinest
...and 5 more sections

Key Result

Lemma 1

For any given $\epsilon>0$, let $M_x,M_y$ be chosen as in line algeq:M of SGDA-B, displayed in Algorithm alg:GDA-B.Then, for any $(\mathbf{x},y)\textcolor{black}{\in\mathop{\bf dom} f\times \mathop{\bf dom} g}$ and any scalar $r>0$, it holds that

Figures (2)

Figure 1: Comparison of SGDA-B against other algorithms, GDAlin2020gradient, AGDAboct2020alternating, TiAdali2022tiada, sm-AGDAyang2022faster and VRLMmancino2023variance on synthetic-data for solving \ref{['eq:bilinear-problem']} with $10$ times simulation.
Figure 2: Comparison of SGDA-B against other algorithms, GDAlin2020gradient, AGDAboct2020alternating, TiAdali2022tiada, and VRLMmancino2023variance on real data for solving \ref{['eq:dro-problmm']} with $10$ times simulation. "Train error" denotes the fraction of wrong prediction, and "loss" denotes $F(\mathbf{x})=\max_{y\in\mathcal{Y}}\mathcal{L}(\mathbf{x},y)$. One epoch means one complete pass of the data set.

Theorems & Definitions (53)

Definition 1
Definition 2
Remark 1
Definition 3
Remark 2
Lemma 1
proof
Definition 4
Lemma 2
proof
...and 43 more

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

TL;DR

Abstract

A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (53)