Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition

Julien Hermant

Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition

Julien Hermant

TL;DR

The paper investigates when momentum acceleration improves convergence for Polyak-Łojasiewicz (PL) functions, highlighting that PL alone does not guarantee acceleration and that strong quasar-convexity can be insufficient. It introduces the gradient aiming condition AC^a, which quantifies alignment between the descent direction and the minimizer, and proves accelerated convergence for gradient methods under AC^a when the alignment is large, with explicit continuous-time and discrete-time bounds. It further relaxes AC to an average-aiming condition along the optimization path, showing that acceleration can persist on average even if AC^a fails globally. Through a 2D PL counterexample and neural-network experiments, the work clarifies when momentum helps or hinders and provides practical guidance for designing accelerated first-order methods in nonconvex settings.

Abstract

It is known that when minimizing smooth Polyak-Łojasiewicz (PL) functions, momentum algorithms cannot significantly improve the convergence bound of gradient descent, contrasting with the acceleration phenomenon occurring in the strongly convex case. To bridge this gap, the literature has proposed strongly quasar-convex functions as an intermediate non-convex class, for which accelerated bounds have been suggested to persist. We show that this is not true in general: the additional structure of strong quasar-convexity does not suffice to guaranty better worst-case bounds for momentum compared to gradient descent. As an alternative, we study PL functions under an aiming condition that measures how well the descent direction points toward a minimizer. This perspective clarifies the geometric ingredient enabling provable acceleration by momentum when minimizing PL functions.

Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition

TL;DR

Abstract

Paper Structure (42 sections, 23 theorems, 232 equations, 11 figures, 1 table)

This paper contains 42 sections, 23 theorems, 232 equations, 11 figures, 1 table.

Introduction
Contributions
Related Works on Aiming Conditions
Background
Algorithms and Equations
Classical setting for acceleration
Relaxed setting
Acceleration under Strong Quasar-Convexity: Hidden Pitfalls
Specific Problems of Strong Quasar-Convexity
Acceleration for PL Functions with an Aiming Condition
$\text{AC}^{a}$ Gives Weight to the Local Information at $x^\ast$
Acceleration using Momentum with Large Enough Aiming Condition Constant
Relaxed Setting: Aiming Condition on Average
Numerical Experiment
An Example with Negative Aiming Condition
...and 27 more sections

Key Result

Theorem 4.2

Let $(x_t)_{t\ge 0} \sim eq:gf$. (i) If $f \in {\text{PL}^\mu}$, then (ii) If $f \in {\text{PL}^\mu} \cap \text{AC}^{a} \cap \text{QG}_{+}^{L_0}$, then where $\mu_0 := \sup\{\mu' \ge\mu : f \in \text{QG}_{-}^{\mu'}\}$.

Figures (11)

Figure 1: Left: graph of $f(t) = 5(t+0.19\sin(5t))^2$, minimizer is $0$. Right: graph of $f(x,y) = 0.5(0.5x^2 -y)^2 + 0.05x^2$, minimizer is $(0,0)$. Bottom: table giving for each function the numerical value of the theoretical convergence rate for \ref{['eq:gd']} or \ref{['eq:nm']}, precising what class of function is used to characterize the bound (see Table \ref{['table:sc_sqc_rates']}), where $\Lambda$ is the \ref{['cr']}. See implementation details in Appendix \ref{['app:param_details']}. It shows that depending on the functions, ${\text{PL}^\mu}$-based bound may be sharper than $\text{SQC}_{\tau}^{\mu}$-based bounds, and conversely.
Figure 2: For a range of parameters $\tau$, we compute the highest admissible $\mu$ ensuring $f \in \text{SQC}_{\tau}^{\mu}$, and plot the numerical values of the associated theoretical convergence rates for \ref{['eq:gd']} and \ref{['eq:nm']} under $\text{SQC}_{\tau}^{\mu} \cap \text{LS}^{L}$, namely $\tau \mu/L$ and $\tau \sqrt{\mu/L}$, with $f(t) = 5(t+0.19\sin(5t))^2$. See implementation details in Appendix \ref{['app:param_details']}. Note that the pair $(\tau, \mu)$ that maximizes the convergence rate of \ref{['eq:gd']} differs from the pair that maximizes the one of \ref{['eq:nm']}.
Figure 3: Left: Heatmap of $F_{0.001}(x,y) = 0.5(y-\sin(x))^2 + 0.001\cdot0.5x^2$. Blue arrows indicate the descent direction $-\nabla F_{0.001}$. Center: First $1000$ iterations of the trajectories of \ref{['eq:gd']} and \ref{['eq:nm_prime']}, both starting from the initialization point $(0,3)$, for different values of momentum parameter $\alpha$. Top right: Corresponding decrease of $\log(f)$. Bottom right: Values of the aiming condition along the iterates, zoomed on the first 100 iterations. Early negative aiming condition values cause momentum to drive the trajectory away from the minimizer, leading \ref{['eq:gd']} to outperform momentum in early iterations, with increasing effect for larger $\alpha$.
Figure 4: Visualization of functions belonging in $\text{SQC}_{\tau}^{\mu}$, for some parameters. The figure is borrowed from hermant2025continuized, see details in Appendix \ref{['app:conceptual_figures']}.
Figure 5: The grey curve is $f(t) = 0.5 \cdot 5(t+0.07\sin(13t))^2$. The blue (red) curve is a quadratic upper (lower) bound. We compute the largest $\mu$ and smallest $L$ such that $f \in {\text{PL}^\mu} \cap \text{LS}^{L}$ on the displayed domain, numerically evaluated at $\mu \approx 4\cdot10^{-2}$, $L \approx 6\cdot 10^2$. Also, the parameters $L_0$ and $\mu_0$ that parameterize the quadratic bounds are $\mu_0 \approx 3$ and $L_0 \approx 18$. On this prototype 1-dimensional example, $\frac{\mu_0}{L_0} \approx 0.2$ while $\frac{\mu}{L} \approx 7\cdot 10^{-5}$. This highlight the possibility of a significant gap between these two ratios.
...and 6 more figures

Theorems & Definitions (40)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 4.1
Theorem 4.2
Theorem 4.3
Theorem 4.4
Theorem 5.2
...and 30 more

Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition

TL;DR

Abstract

Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (40)