Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

Lesi Chen; Luo Luo

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

Lesi Chen, Luo Luo

TL;DR

This work addresses finding near-stationary points in stochastic minimax optimization by introducing Recursive Anchored Iteration (RAIN), a framework that progressively anchors subproblems to reduce gradient norms efficiently. The core idea combines anchored regularization with a stochastic extragradient subroutine (Epoch-SEG), yielding near-optimal SFO complexity in convex-concave and strongly-convex-strongly-concave settings. The authors extend the approach to nonconvex-nonconcave cases via saddle envelopes and introduce RAIN$^{++}$ with MLMC-based debiasing to achieve similar near-optimal guarantees under comonotone and intersection-dominant conditions. Theoretical results are complemented by numerical experiments showing superior performance over established baselines across CC and NC regimes, underscoring the practical impact of near-optimal stochastic minimax optimization. Overall, the paper advances both methodological and complexity-theoretic understanding of stochastic minimax optimization and provides practical algorithms with strong guarantees.

Abstract

We study the problem of finding a near-stationary point for smooth minimax optimization. The recently proposed extra anchored gradient (EAG) methods achieve the optimal convergence rate for the convex-concave minimax problem in the deterministic setting. However, the direct extension of EAG to stochastic optimization is not efficient. In this paper, we design a novel stochastic algorithm called Recursive Anchored IteratioN (RAIN). We show that the RAIN achieves near-optimal stochastic first-order oracle (SFO) complexity for stochastic minimax optimization in both convex-concave and strongly-convex-strongly-concave cases. In addition, we extend the idea of RAIN to solve structured nonconvex-nonconcave minimax problem and it also achieves near-optimal SFO complexity.

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

TL;DR

with MLMC-based debiasing to achieve similar near-optimal guarantees under comonotone and intersection-dominant conditions. Theoretical results are complemented by numerical experiments showing superior performance over established baselines across CC and NC regimes, underscoring the practical impact of near-optimal stochastic minimax optimization. Overall, the paper advances both methodological and complexity-theoretic understanding of stochastic minimax optimization and provides practical algorithms with strong guarantees.

Abstract

Paper Structure (41 sections, 23 theorems, 132 equations, 4 figures, 2 tables)

This paper contains 41 sections, 23 theorems, 132 equations, 4 figures, 2 tables.

Introduction
Notation and Preliminaries
The Recursive Anchored Iteration
Connection to Related Work
Complexity Analysis for RAIN
Extension to Nonconvex-Nonconcave Settings
The Lower Complexity Bounds
Numerical Experiments
The Convex-Concave Case
The Nonconvex-Nonconcave Case
Conclusion
The Proofs in Section \ref{['sec:RAIN']}
The Proof of Lemma \ref{['lem:RAL']}
The Proof of Lemma \ref{['lem:anchoring']}
The Proofs in Section \ref{['sec:convex-concave']}
...and 26 more sections

Key Result

Lemma 2.1

Under Assumption asm:smooth and asm:cc, it holds that for all $z=(x,y)$ and $z'=(x',y')$.

Figures (4)

Figure 1: The results of the number of SFO calls against gradient norm on problem (\ref{['func-xy']}).
Figure 2: The results of the number of SFO calls against gradient norm on problem (\ref{['func-delta']}). SEAG diverges in (c), which does not contradict its convergence guarantee as the condition $\sigma_k^2 \le \epsilon / (k+1)$ in Theorem 6.1 lee2021fast is unsatisfied.
Figure 3: The results of the number of SFO calls against gradient norm on problem (\ref{['func-rho']}) with $L=1$ and $\rho = - {1}/{(8 \sqrt{2})}$.
Figure 4: The results of the number of SFO calls against gradient norm on problem (\ref{['func-rho']}) with $L=1$ and $\rho = - {1}/3$.

Theorems & Definitions (31)

Lemma 2.1: monotonicity
Lemma 2.2: strong monotonicity
Definition 2.1: nearly-stationary point
Lemma 3.1: recursively anchoring lemma
Remark 3.1
Lemma 3.2: anchoring lemma
Lemma 4.1: SEG
Lemma 4.2: Epoch-SEG
Theorem 4.1: RAIN, SCSC
Theorem 4.2: RAIN, CC
...and 21 more

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

TL;DR

Abstract

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (31)