A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo
TL;DR
This work tackles nonconvex-concave min–max problems, where standard gradient descent-ascent can oscillate and fail to converge. It proposes a single-loop Smoothed-GDA method that injects proximal smoothing via an auxiliary sequence $\{z^t\}$ and a quadratic term in the primal update, ensuring convergence to stationary points. The authors prove that the Smoothed-GDA framework achieves $O\left(\varepsilon^{-2}\right)$ iteration complexity for the finite-max special case and $O\left(\varepsilon^{-4}\right)$ for general nonconvex-concave problems, with an extension to multi-block settings. They validate the approach experimentally on robust neural-network training, where the method exhibits faster convergence and competitive robustness compared to existing algorithms, indicating practical impact for scalable, stable minimax optimization in machine learning.
Abstract
Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/ε^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/ε^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/ε^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.
