Table of Contents
Fetching ...

Accelerated Minimax Algorithms Flock Together

TaeHo Yoon, Ernest K. Ryu

TL;DR

This work studies accelerated minimax optimization by introducing the merging path (MP) property, revealing that anchored acceleration methods share near-equivalent trajectories that rapidly merge to the solution. It proves an O(1/k^2)-MP relation among EAG, FEG, APS, and OHM, establishing point convergence and unifying multiple acceleration schemes. The authors then design SM-EAG+ to approximate OC-Halpern, achieving the fastest known gradient-norm rate for unconstrained smooth strongly-convex-strongly-concave minimax problems, and develop APG* to near-optimally accelerate prox-grad-type minimax problems via MP. Theoretical analyses are complemented by numerical experiments demonstrating rapid MP-driven convergence, with extensions to Hilbert spaces and open questions on the fundamental MP mechanism and broader applicability.

Abstract

Several new accelerated methods in minimax optimization and fixed-point iterations have recently been discovered, and, interestingly, they rely on a mechanism distinct from Nesterov's momentum-based acceleration. In this work, we show that these accelerated algorithms exhibit what we call the merging path (MP) property; the trajectories of these algorithms merge quickly. Using this novel MP property, we establish point convergence of existing accelerated minimax algorithms and derive new state-of-the-art algorithms for the strongly-convex-strongly-concave setup and for the prox-grad setup.

Accelerated Minimax Algorithms Flock Together

TL;DR

This work studies accelerated minimax optimization by introducing the merging path (MP) property, revealing that anchored acceleration methods share near-equivalent trajectories that rapidly merge to the solution. It proves an O(1/k^2)-MP relation among EAG, FEG, APS, and OHM, establishing point convergence and unifying multiple acceleration schemes. The authors then design SM-EAG+ to approximate OC-Halpern, achieving the fastest known gradient-norm rate for unconstrained smooth strongly-convex-strongly-concave minimax problems, and develop APG* to near-optimally accelerate prox-grad-type minimax problems via MP. Theoretical analyses are complemented by numerical experiments demonstrating rapid MP-driven convergence, with extensions to Hilbert spaces and open questions on the fundamental MP mechanism and broader applicability.

Abstract

Several new accelerated methods in minimax optimization and fixed-point iterations have recently been discovered, and, interestingly, they rely on a mechanism distinct from Nesterov's momentum-based acceleration. In this work, we show that these accelerated algorithms exhibit what we call the merging path (MP) property; the trajectories of these algorithms merge quickly. Using this novel MP property, we establish point convergence of existing accelerated minimax algorithms and derive new state-of-the-art algorithms for the strongly-convex-strongly-concave setup and for the prox-grad setup.
Paper Structure (43 sections, 18 theorems, 190 equations, 3 figures)

This paper contains 43 sections, 18 theorems, 190 equations, 3 figures.

Key Result

theorem 1

Let ${ \tl_map_inline:nn{B} { \use:c{varbb1} } }\colon \mathbb{R}^d \to \mathbb{R}^d$ be monotone and $L$-Lipschitz and assume $z_\star \in\mathrm{Zer}\,{ \tl_map_inline:nn{B} { \use:c{varbb1} } } = \mathrm{Fix}\, { \tl_map_inline:nn{J} { \use:c{varbb1} } }_{\alpha{ \tl_map_inline:nn{B} { \use:c{var

Figures (3)

  • Figure 1: Trajectories of (left) Nesterov's accelerated algorithms (AGM Nesterov1983_methodBeckTeboulle2009_fastChambolleDossal2015_convergence; described in \ref{['subsec:AGM_and_FISTA']}) with $\alpha=0.025$ and distinct momentum parameters and (right) anchoring-based algorithms with $\alpha=0.1$. Here, $\alpha$ denotes the step-size. On the right, paths quickly merge and become indistinguishable in 50 iterations. All algorithms are executed on the minimization problem with $f(x_1,x_2)=\frac{4x_1^2}{x_2}$ (which is convex and smooth on, e.g., $|x_1| \le 2, x_2 \ge 3$, where all iterates stay within), starting at $(x_1,x_2) = (-2,3)$.
  • Figure 2: Comparison of EG, OG and SM-EAG+ on the problem \ref{['eqn:smeag-experiment-objective']}. The condition number $\frac{L}{\mu}$ of the problem is $10^5$. We use the largest possible step-size for SM-EAG+, and tuned step-sizes for EG and OG (for best performance with respect to the gradient norm). In the left figure, the dashed black line labeled as "Theory" indicates the theoretical upper bound on $\|{ \tl_map_inline:nn{B} { \use:c{varbb1} } } z_k\|^2$ from \ref{['thm:sm-EAG']}.
  • Figure 3: Comparison of mirror-prox, dual extrapolation and APG* on the problem \ref{['eqn:apg-experiment-objective']}, in terms of the squared forward-backward residual norm.

Theorems & Definitions (27)

  • theorem 1: EAG$\approx$FEG$\approx$APS$\approx$OHM
  • proof
  • lemma 2
  • corollary 3
  • theorem 4: \ref{['eqn:SM-EAG+']}$\approx$OC-Halpern
  • theorem 5: Fast rate of \ref{['eqn:SM-EAG+']}
  • theorem 6
  • lemma 7
  • lemma 8
  • theorem 9: \ref{['eqn:APG*']}$\approx$\ref{['eqn:DRS-Halpern_alternate_form']}
  • ...and 17 more