Table of Contents
Fetching ...

Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games

Yi Feng, Kaito Fujii, Stratis Skoulakis, Xiao Wang, Volkan Cevher

TL;DR

This work provides a continuous-time analysis of heavy-ball momentum in min-max games, examining both simultaneous and alternating update schemes near local Nash equilibria. The authors show that, unlike in minimization, smaller momentum can widen the local convergence region and alternating updates often enhance convergence, while implicit gradient regularization drives trajectories toward shallower gradient regions with this setup. Theoretical results are complemented by numerical experiments on 2D functions and GANs, validating the local convergence and stability benefits of smaller momentum and alternating updates. Overall, the study reveals fundamental differences between HB dynamics in min-max games and minimization, offering guidance for designing more stable min-max optimization algorithms. The findings have practical implications for training GANs and other adversarial models where stability and convergence are critical.

Abstract

Since Polyak's pioneering work, heavy ball (HB) momentum has been widely studied in minimization. However, its role in min-max games remains largely unexplored. As a key component of practical min-max algorithms like Adam, this gap limits their effectiveness. In this paper, we present a continuous-time analysis for HB with simultaneous and alternating update schemes in min-max games. Locally, we prove smaller momentum enhances algorithmic stability by enabling local convergence across a wider range of step sizes, with alternating updates generally converging faster. Globally, we study the implicit regularization of HB, and find smaller momentum guides algorithms trajectories towards shallower slope regions of the loss landscapes, with alternating updates amplifying this effect. Surprisingly, all these phenomena differ from those observed in minimization, where larger momentum yields similar effects. Our results reveal fundamental differences between HB in min-max games and minimization, and numerical experiments further validate our theoretical results.

Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games

TL;DR

This work provides a continuous-time analysis of heavy-ball momentum in min-max games, examining both simultaneous and alternating update schemes near local Nash equilibria. The authors show that, unlike in minimization, smaller momentum can widen the local convergence region and alternating updates often enhance convergence, while implicit gradient regularization drives trajectories toward shallower gradient regions with this setup. Theoretical results are complemented by numerical experiments on 2D functions and GANs, validating the local convergence and stability benefits of smaller momentum and alternating updates. Overall, the study reveals fundamental differences between HB dynamics in min-max games and minimization, offering guidance for designing more stable min-max optimization algorithms. The findings have practical implications for training GANs and other adversarial models where stability and convergence are critical.

Abstract

Since Polyak's pioneering work, heavy ball (HB) momentum has been widely studied in minimization. However, its role in min-max games remains largely unexplored. As a key component of practical min-max algorithms like Adam, this gap limits their effectiveness. In this paper, we present a continuous-time analysis for HB with simultaneous and alternating update schemes in min-max games. Locally, we prove smaller momentum enhances algorithmic stability by enabling local convergence across a wider range of step sizes, with alternating updates generally converging faster. Globally, we study the implicit regularization of HB, and find smaller momentum guides algorithms trajectories towards shallower slope regions of the loss landscapes, with alternating updates amplifying this effect. Surprisingly, all these phenomena differ from those observed in minimization, where larger momentum yields similar effects. Our results reveal fundamental differences between HB in min-max games and minimization, and numerical experiments further validate our theoretical results.

Paper Structure

This paper contains 56 sections, 25 theorems, 155 equations, 18 figures, 1 algorithm.

Key Result

Proposition 2.1

If $\alpha = \max_{\lambda \in \mathrm{Sp}({\mathcal{J}}_g)} \Re(\lambda) < 0$, then there exist constants $\delta > 0$ and $C > 0$ such that for all initial conditions satisfying $\lVert x(0) - \tilde{x} \rVert \le \delta$, we have $\lVert x(t) - \tilde{x} \rVert \le C e^{t \alpha}, \forall t>0.$

Figures (18)

  • Figure 1: Comparison of ${\mathcal{O}}(h^3)$ and ${\mathcal{O}}(h^2)$-local error models with payoff function $f(x,y) = xy$.
  • Figure 2: Distribution on the eigenvalues' maximal real part of ${\mathcal{J}}_S$, which governs the local behaviors according to Proposition \ref{['Jacobiancr']}. The black region indicates divergence for the corresponding parameters. Smaller momentum expands the range of step sizes for convergence, supporting Corollary \ref{['nmics']}. For small step sizes, the optimal momentum is positive, consistent with Theorem \ref{['t42']}.
  • Figure 3: Trajectories and average slopes for test function 1 and 2. The background color in trajectories' pictures represent the magnitude of the gradient norm, i.e., $\lVert \nabla_x f\lVert^2 + \lVert \nabla_y f\lVert^2$.
  • Figure 4: Experimental results for GANs training dynamics. Smaller momentum and alternating updates lead the trajectories to lower average slopes. Trajectories with lower average slopes also have lower FID, indicating the better GANs training outcome.
  • Figure 5: The test function is $f(x,y) = x(y-0.45) + \phi(x) - \phi(y),\ \phi(z) = \frac{1}{4}z^2 - \frac{1}{2}z^4+\frac{1}{6}z^6$. This function is also used by top left of Figure 1 in (Compagnoni et al., 2024) to compare the trajectories between algorithms and their SDE models. Here we set step size $h=0.001$ and $\beta = -0.4$ for the heavy ball method. The trajectories converge to limit cycles. From \ref{['t1']} and \ref{['t1a']}, the trajectories of the discrete-time algorithms closely match our continuous-time equations. In \ref{['d1']} and \ref{['d1a']}, we show the Euclidean distance between these trajectories. Specially, the distance between trajectories remains around $0.01$ after $100000$ iterations
  • ...and 13 more figures

Theorems & Definitions (41)

  • Proposition 2.1: muehlebach2021optimization
  • Theorem 3.1
  • Proposition 3.2: Consistency with gidel2019negative
  • Lemma 4.1
  • Proposition 4.2: Jacobian for simultaneous updates
  • Proposition 4.3: Jacobian for alternating updates
  • Theorem 4.6
  • Corollary 4.7
  • Theorem 4.8
  • Theorem 4.9
  • ...and 31 more