Table of Contents
Fetching ...

Two-timescale Extragradient for Finding Local Minimax Points

Jiseok Chae, Kyuwon Kim, Donghwan Kim

TL;DR

This paper analyzes minimax optimization in nonconvex-nonconcave settings by proposing a two-timescale extragradient (EG) method and studying its dynamics via dynamical-systems theory. It develops a refined second-order characterization of local minimax points using a restricted Schur complement and introduces hemicurvature to capture spectral behavior when the Hessian in the maximization variable is degenerate. The authors prove that two-timescale EG converges to points satisfying a refined second-order necessary condition, almost surely avoids strict non-minimax points, and, under Minty variational inequality (MVI), globally converges to local minimax points. The results extend prior work by removing nondegeneracy requirements on ${\nabla_{yy}^2 f}$ and provide a unified, dynamical-systems framework for understanding stability and convergence in minimax optimization, with implications for training generative models and adversarial settings.

Abstract

Minimax problems are notoriously challenging to optimize. However, we present that the two-timescale extragradient method can be a viable solution. By utilizing dynamical systems theory, we show that it converges to points that satisfy the second-order necessary condition of local minimax points, under mild conditions that the two-timescale gradient descent ascent fails to work. This work provably improves upon all previous results on finding local minimax points, by eliminating a crucial assumption that the Hessian with respect to the maximization variable is nondegenerate.

Two-timescale Extragradient for Finding Local Minimax Points

TL;DR

This paper analyzes minimax optimization in nonconvex-nonconcave settings by proposing a two-timescale extragradient (EG) method and studying its dynamics via dynamical-systems theory. It develops a refined second-order characterization of local minimax points using a restricted Schur complement and introduces hemicurvature to capture spectral behavior when the Hessian in the maximization variable is degenerate. The authors prove that two-timescale EG converges to points satisfying a refined second-order necessary condition, almost surely avoids strict non-minimax points, and, under Minty variational inequality (MVI), globally converges to local minimax points. The results extend prior work by removing nondegeneracy requirements on and provide a unified, dynamical-systems framework for understanding stability and convergence in minimax optimization, with implications for training generative models and adversarial settings.

Abstract

Minimax problems are notoriously challenging to optimize. However, we present that the two-timescale extragradient method can be a viable solution. By utilizing dynamical systems theory, we show that it converges to points that satisfy the second-order necessary condition of local minimax points, under mild conditions that the two-timescale gradient descent ascent fails to work. This work provably improves upon all previous results on finding local minimax points, by eliminating a crucial assumption that the Hessian with respect to the maximization variable is nondegenerate.
Paper Structure (55 sections, 46 theorems, 138 equations, 5 figures)

This paper contains 55 sections, 46 theorems, 138 equations, 5 figures.

Key Result

Lemma 2.1

Under Assumption assum:hessian, let $s\coloneqq \eta/2$ and $0< s < 1/L$. Then, the ordinary differential equation $\dot{{\bm{z}}}(t) = - ({\bm{I}} + sD{\bm{F}}({\bm{z}}(t)))^{-1}{\bm{F}}({\bm{z}}(t))$ is a $O(s)$-approximation of EG.

Figures (5)

  • Figure 1: Two disks and a curve, all tangent to the same line at the same point.
  • Figure 2: A depiction of the target sets of continuous two-timescale methods. Notice that for $\tau$-EG we have overlaid two target sets with different choices of $s$ on the same plot.
  • Figure 3: The peanut-shaped region $\mathcal{P}_\eta$ in the complex plane
  • Figure 4: Two peanut-shaped region and a curve, all tangent to the same line at the same point.
  • Figure 5: Target sets of discrete-time $\tau$-EG and $\tau$-GDA, both using $\eta$ as their step sizes. The (blue) peanut-shaped region is the target set of $\tau$-EG, and the (orange) disk is the target set of $\tau$-GDA.

Theorems & Definitions (95)

  • Definition 1: jin:20:wil
  • Lemma 2.1
  • Definition 2: Linear stability of dynamical systems
  • Definition 3
  • Theorem 2.2: lee:19:fom
  • Definition 4
  • Proposition 3.1
  • Proposition 3.2: Refined second-order necessary condition
  • Remark 3.3
  • Definition 5: Strict non-minimax point; ${\mathcal{T}}^*$
  • ...and 85 more