Acceleration for Polyak-Łojasiewicz Functions with a Gradient Aiming Condition
Julien Hermant
TL;DR
The paper investigates when momentum acceleration improves convergence for Polyak-Łojasiewicz (PL) functions, highlighting that PL alone does not guarantee acceleration and that strong quasar-convexity can be insufficient. It introduces the gradient aiming condition AC^a, which quantifies alignment between the descent direction and the minimizer, and proves accelerated convergence for gradient methods under AC^a when the alignment is large, with explicit continuous-time and discrete-time bounds. It further relaxes AC to an average-aiming condition along the optimization path, showing that acceleration can persist on average even if AC^a fails globally. Through a 2D PL counterexample and neural-network experiments, the work clarifies when momentum helps or hinders and provides practical guidance for designing accelerated first-order methods in nonconvex settings.
Abstract
It is known that when minimizing smooth Polyak-Łojasiewicz (PL) functions, momentum algorithms cannot significantly improve the convergence bound of gradient descent, contrasting with the acceleration phenomenon occurring in the strongly convex case. To bridge this gap, the literature has proposed strongly quasar-convex functions as an intermediate non-convex class, for which accelerated bounds have been suggested to persist. We show that this is not true in general: the additional structure of strong quasar-convexity does not suffice to guaranty better worst-case bounds for momentum compared to gradient descent. As an alternative, we study PL functions under an aiming condition that measures how well the descent direction points toward a minimizer. This perspective clarifies the geometric ingredient enabling provable acceleration by momentum when minimizing PL functions.
