Two-timescale Extragradient for Finding Local Minimax Points

Jiseok Chae; Kyuwon Kim; Donghwan Kim

Two-timescale Extragradient for Finding Local Minimax Points

Jiseok Chae, Kyuwon Kim, Donghwan Kim

TL;DR

This paper analyzes minimax optimization in nonconvex-nonconcave settings by proposing a two-timescale extragradient (EG) method and studying its dynamics via dynamical-systems theory. It develops a refined second-order characterization of local minimax points using a restricted Schur complement and introduces hemicurvature to capture spectral behavior when the Hessian in the maximization variable is degenerate. The authors prove that two-timescale EG converges to points satisfying a refined second-order necessary condition, almost surely avoids strict non-minimax points, and, under Minty variational inequality (MVI), globally converges to local minimax points. The results extend prior work by removing nondegeneracy requirements on ${\nabla_{yy}^2 f}$ and provide a unified, dynamical-systems framework for understanding stability and convergence in minimax optimization, with implications for training generative models and adversarial settings.

Abstract

Minimax problems are notoriously challenging to optimize. However, we present that the two-timescale extragradient method can be a viable solution. By utilizing dynamical systems theory, we show that it converges to points that satisfy the second-order necessary condition of local minimax points, under mild conditions that the two-timescale gradient descent ascent fails to work. This work provably improves upon all previous results on finding local minimax points, by eliminating a crucial assumption that the Hessian with respect to the maximization variable is nondegenerate.

Two-timescale Extragradient for Finding Local Minimax Points

TL;DR

and provide a unified, dynamical-systems framework for understanding stability and convergence in minimax optimization, with implications for training generative models and adversarial settings.

Abstract

Paper Structure (55 sections, 46 theorems, 138 equations, 5 figures)

This paper contains 55 sections, 46 theorems, 138 equations, 5 figures.

Introduction
Preliminaries
Notations and problem setting
Local minimax points
Gradient descent ascent and extragradient
Dynamical systems
On the necessary condition of local minimax points
Restricted Schur complement
Refining the second-order necessary condition
Characterizing timescale separation without nondegeneracy condition on ∇_yy^2 f
Timescale separation in GDA and its relation to stability
Timescale separation without the nondegeneracy condition on ∇_yy^2 f
Hemicurvature of the eigenvalue function lambda_j(epsilon)
Two-timescale GDA avoids some non-strict local minimax points
Local minimax points and the limit points of two-timescale EG
...and 40 more sections

Key Result

Lemma 2.1

Under Assumption assum:hessian, let $s\coloneqq \eta/2$ and $0< s < 1/L$. Then, the ordinary differential equation $\dot{{\bm{z}}}(t) = - ({\bm{I}} + sD{\bm{F}}({\bm{z}}(t)))^{-1}{\bm{F}}({\bm{z}}(t))$ is a $O(s)$-approximation of EG.

Figures (5)

Figure 1: Two disks and a curve, all tangent to the same line at the same point.
Figure 2: A depiction of the target sets of continuous two-timescale methods. Notice that for $\tau$-EG we have overlaid two target sets with different choices of $s$ on the same plot.
Figure 3: The peanut-shaped region $\mathcal{P}_\eta$ in the complex plane
Figure 4: Two peanut-shaped region and a curve, all tangent to the same line at the same point.
Figure 5: Target sets of discrete-time $\tau$-EG and $\tau$-GDA, both using $\eta$ as their step sizes. The (blue) peanut-shaped region is the target set of $\tau$-EG, and the (orange) disk is the target set of $\tau$-GDA.

Theorems & Definitions (95)

Definition 1: jin:20:wil
Lemma 2.1
Definition 2: Linear stability of dynamical systems
Definition 3
Theorem 2.2: lee:19:fom
Definition 4
Proposition 3.1
Proposition 3.2: Refined second-order necessary condition
Remark 3.3
Definition 5: Strict non-minimax point; ${\mathcal{T}}^*$
...and 85 more

Two-timescale Extragradient for Finding Local Minimax Points

TL;DR

Abstract

Two-timescale Extragradient for Finding Local Minimax Points

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (95)