When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

Junhyung Lyle Kim; Gauthier Gidel; Anastasios Kyrillidis; Fabian Pedregosa

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa

TL;DR

This work analyzes Momentum Extragradient (MEG) for differentiable games through a polynomial-based lens that ties convergence to the Jacobian spectrum of the game dynamics. By expressing MEG residuals with Chebyshev polynomials and introducing a link function σ(λ), the authors classify spectra into three robust modes and derive exact optimal hyperparameters (h, γ, m) for each. The resulting asymptotic rates show accelerated convergence in all three cases, with Case 1 exhibiting a super-accelerated behavior and Cases 2–3 achieving rates close to or beyond established lower bounds for complex-spectrum problems. Local convergence for non-affine vector fields is established via momentum restarting, and numerical experiments on quadratic games corroborate the theory, highlighting MEG’s superiority over GD, EG, and GDM under the studied spectral conditions.

Abstract

The extragradient method has gained popularity due to its robust convergence properties for differentiable games. Unlike single-objective optimization, game dynamics involve complex interactions reflected by the eigenvalues of the game vector field's Jacobian scattered across the complex plane. This complexity can cause the simple gradient method to diverge, even for bilinear games, while the extragradient method achieves convergence. Building on the recently proven accelerated convergence of the momentum extragradient method for bilinear games \citep{azizian2020accelerating}, we use a polynomial-based analysis to identify three distinct scenarios where this method exhibits further accelerated convergence. These scenarios encompass situations where the eigenvalues reside on the (positive) real line, lie on the real line alongside complex conjugates, or exist solely as complex conjugates. Furthermore, we derive the hyperparameters for each scenario that achieve the fastest convergence rate.

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

TL;DR

Abstract

Paper Structure (33 sections, 16 theorems, 114 equations, 3 figures, 1 algorithm)

This paper contains 33 sections, 16 theorems, 114 equations, 3 figures, 1 algorithm.

Introduction
Problem Setup and Related Work
Related Work
Unlocking Faster Rates Through Fine-Grained Spectral Shapes.
Momentum Extragradient via Chebyshev Polynomials
Three Modes of the Momentum Extragradient
Robust Region-Induced Problem Cases
Optimal Parameters and Convergence Rates
Case 1: minimization.
Case 2: cross-shaped spectrum.
Case 3: shifted imaginary spectrum.
Comparison with Other Methods
Local Convergence for Non-affine Vector Fields
Experiments
Conclusion
...and 18 more sections

Key Result

Lemma 1

Let $w_t$ be the iterate generated by a first-order method after $t$ iterations, with $v(w) = Aw + b$. Then, there exists a real polynomial $P_t$, of degree at most $t$, satisfying: where $P_t(0) = 1$, and $v(w^\star) = Aw^\star + b =0$.

Figures (3)

Figure 1: Convergence rates of MEG in terms of the game Jacobian eigenvalues. The step sizes for MEG, $\textcolor{teal}{h}$ and $\textcolor{Bittersweet}{\gamma}$, and the momentum parameter $\textcolor{Fuchsia}{m}$ are set up according to each case of Theorem \ref{['thm:three-modes']}, illustrating three distinct convergence modes of MEG. For each case, the red line indicates the robust region (c.f., Definition \ref{['def:robust_region']}) where MEG achieves the optimal convergence rate.
Figure 2: Illustration of the three spectrum models where MEG achieves accelerated convergence rates.
Figure 3: Illustration of the game Jacobian spectra and the performance of different algorithms considered. Jacobian spectrum in the first plot matches $\mathcal{S}_2^\star$ in \ref{['eq:eigen-model']} precisely, while that in the third plot inexactly follows $\mathcal{S}_2^\star$. The second (fourth) plot shows the performance of different algorithms for solving quadratic games in \ref{['eq:quad-loss-1']} with the Jacobian spectrum following the first (third) plot.

Theorems & Definitions (33)

Lemma 1: chihara2011introduction
Theorem 1: Residual polynomials of MEG and their Chebyshev representation
Lemma 2: goujaud2022cyclical
Definition 1: Robust region of MEG
Remark 1
Theorem 2: Asymptotic convergence rate of MEG
Theorem 3
Remark 2
Proposition 1
Theorem 4: Case 1
...and 23 more

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

TL;DR

Abstract

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (33)