Table of Contents
Fetching ...

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo

TL;DR

This work tackles nonconvex-concave min–max problems, where standard gradient descent-ascent can oscillate and fail to converge. It proposes a single-loop Smoothed-GDA method that injects proximal smoothing via an auxiliary sequence $\{z^t\}$ and a quadratic term in the primal update, ensuring convergence to stationary points. The authors prove that the Smoothed-GDA framework achieves $O\left(\varepsilon^{-2}\right)$ iteration complexity for the finite-max special case and $O\left(\varepsilon^{-4}\right)$ for general nonconvex-concave problems, with an extension to multi-block settings. They validate the approach experimentally on robust neural-network training, where the method exhibits faster convergence and competitive robustness compared to existing algorithms, indicating practical impact for scalable, stable minimax optimization in machine learning.

Abstract

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/ε^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/ε^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/ε^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

TL;DR

This work tackles nonconvex-concave min–max problems, where standard gradient descent-ascent can oscillate and fail to converge. It proposes a single-loop Smoothed-GDA method that injects proximal smoothing via an auxiliary sequence and a quadratic term in the primal update, ensuring convergence to stationary points. The authors prove that the Smoothed-GDA framework achieves iteration complexity for the finite-max special case and for general nonconvex-concave problems, with an extension to multi-block settings. They validate the approach experimentally on robust neural-network training, where the method exhibits faster convergence and competitive robustness compared to existing algorithms, indicating practical impact for scalable, stable minimax optimization in machine learning.

Abstract

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

Paper Structure

This paper contains 27 sections, 37 theorems, 211 equations, 2 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.4

Consider solving problem minimax1 by Algorithm Alg2 (or Algorithm Alg3). Suppose Assumption basic-ass holds, and we choose the algorithm parameters to satisfy $p>3L,\ c<1/(p+L)$ and Then, the following holds:

Figures (2)

  • Figure 1: Convergence speed of Smoothed-GDA and the algorithm in nouiehed2019solving.
  • Figure 2: Convergence speed of Smoothed-GDA and the algorithm in nouiehed2019solving on CIFAR10.

Theorems & Definitions (42)

  • Definition 3.1
  • Definition 3.2
  • Theorem 3.4
  • Definition 3.5
  • Theorem 3.6
  • Theorem 3.10
  • Proposition 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma B.1
  • ...and 32 more