Table of Contents
Fetching ...

Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality

Sebastian Kassing, Simon Weissmann

TL;DR

The paper analyzes Polyak's heavy ball method for nonconvex $C^4$ objectives that satisfy the Polyak–Lojasiewicz inequality, showing that local acceleration is achievable in both continuous and discrete time. A differential-geometric PL framework is developed, separating normal and tangential dynamics around the PL manifold to obtain accelerated local rates and optimal parameter choices; in continuous time the rate is governed by $m(\alpha)$, while in discrete time the rate is governed by $m(\gamma,\beta)$ with optimal values achieving $m(\gamma,\beta)=\tfrac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}$. The analysis relies on a normal-bundle chart and a purely geometric argument rather than Lyapunov functions, and it demonstrates that acceleration persists locally even when global convergence under aggressive momentum fails. Numerical experiments corroborate the theoretical predictions, illustrating the trade-off between faster asymptotic rates and longer burn-in due to entering the local PL regime.

Abstract

In this work, we analyze the convergence of Polyak's heavy ball method in both continuous and discrete time for non-convex $C^4$-objective functions satisfying the Polyak-Lojasiewicz inequality. Under this weak assumption, we recover the asymptotic convergence rates originally derived by Polyak in [Polyak, U.S.S.R. Comput. Math. and Math. Phys., 1964] for strongly convex objectives. Our results demonstrate that the heavy ball method exhibits asymptotic local acceleration on this class of functions. In particular, in the discrete time setting, we prove local convergence of the iterates to a minimum once the method enters a sufficiently small neighborhood of the set of minima, for a broad range of hyperparameters, including aggressive choices for the momentum parameter and the step-size for which global convergence is known to fail. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [Rebjock and Boumal, Math. Program., 2025].

Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality

TL;DR

The paper analyzes Polyak's heavy ball method for nonconvex objectives that satisfy the Polyak–Lojasiewicz inequality, showing that local acceleration is achievable in both continuous and discrete time. A differential-geometric PL framework is developed, separating normal and tangential dynamics around the PL manifold to obtain accelerated local rates and optimal parameter choices; in continuous time the rate is governed by , while in discrete time the rate is governed by with optimal values achieving . The analysis relies on a normal-bundle chart and a purely geometric argument rather than Lyapunov functions, and it demonstrates that acceleration persists locally even when global convergence under aggressive momentum fails. Numerical experiments corroborate the theoretical predictions, illustrating the trade-off between faster asymptotic rates and longer burn-in due to entering the local PL regime.

Abstract

In this work, we analyze the convergence of Polyak's heavy ball method in both continuous and discrete time for non-convex -objective functions satisfying the Polyak-Lojasiewicz inequality. Under this weak assumption, we recover the asymptotic convergence rates originally derived by Polyak in [Polyak, U.S.S.R. Comput. Math. and Math. Phys., 1964] for strongly convex objectives. Our results demonstrate that the heavy ball method exhibits asymptotic local acceleration on this class of functions. In particular, in the discrete time setting, we prove local convergence of the iterates to a minimum once the method enters a sufficiently small neighborhood of the set of minima, for a broad range of hyperparameters, including aggressive choices for the momentum parameter and the step-size for which global convergence is known to fail. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [Rebjock and Boumal, Math. Program., 2025].

Paper Structure

This paper contains 15 sections, 12 theorems, 112 equations, 3 figures, 1 table.

Key Result

Theorem 1.2

Let $f:{\mathbb R}^d \to {\mathbb R}$ be continuously differentiable and $x^\ast \in {\mathbb R}^d$ be a $(\mu,L)$-regular point. Let $(x_t,v_t)_{t\ge0}$ be a solution of eq:HBODE with initial condition $x_0, v_0\in{\mathbb R}^d$ and friction parameter $\alpha >0$. If $(x_t,v_t) \to (x^\ast,0)$ as $ and where $m(\alpha)=\frac{1}{2}(\alpha-\sqrt{\max(0,\alpha^2-4\mu)})$. In particular, $m(\alpha)$

Figures (3)

  • Figure 1: Visual illustration of the chart $\Phi$ from Lemma \ref{['lem:chart']}.
  • Figure 2: A three-dimensional surface plot of the objective function, illustrating the geometry of the optimization landscape. The set of global minimizers is visualized as black solid line.
  • Figure 3: (Left) Evolution of the objective function values along the iterates for four different choices of estimated PL constants. Solid lines (in different colors) show the empirical losses, while the corresponding dashed lines represent the theoretical convergence rates predicted by our analysis, using the same color coding. (Right) Values of the local PL ratio $\mu(z):=\frac{1}{2}\frac{g'(z)^2}{g(z)}$ evaluated along the real line, illustrating how the effective PL constant varies with $z$.

Theorems & Definitions (24)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 3.1: See Theorem 2.16 and Corollary 2.17 in rebjock2023fast
  • Lemma 3.2
  • proof
  • Lemma 4.1
  • Remark 4.2
  • Lemma 4.3
  • proof
  • ...and 14 more