Table of Contents
Fetching ...

New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results

Francesco Orabona, Ryan D'Orazio

TL;DR

The paper reframes the Polyak stepsize as gradient descent on a surrogate loss $\phi(x)=\tfrac{1}{2}(f(x)-f^*)^2$, with adaptive updates driven by a local curvature measure. It proves that $\phi$ is always locally star-upper curved and inherits curvature from $f$, enabling convergence analysis under minimal assumptions and yielding rates comparable to standard gradient methods when the local curvature is known. It generalizes to a broader surrogate family $\psi(x)=\tfrac{1}{2}h(x)^2$ and extends to stochastic variants, showing SPS$_{\max}$ is adaptive across curvature regimes and revealing potential neighbourhood convergence and instability when surrogates are imperfect or in stochastic settings. It also provides negative results demonstrating intrinsic non-convergence phenomena in certain surrogate configurations, underscoring limits tied to the surrogate's minimum and curvature. Overall, the work offers a unified geometric perspective for understanding and designing Polyak-type methods and motivates surrogate design as a route to new algorithms.

Abstract

The Polyak stepsize has been proven to be a fundamental stepsize in convex optimization, giving near optimal gradient descent rates across a wide range of assumptions. The universality of the Polyak stepsize has also inspired many stochastic variants, with theoretical guarantees and strong empirical performance. Despite the many theoretical results, our understanding of the convergence properties and shortcomings of the Polyak stepsize or its variants is both incomplete and fractured across different analyses. We propose a new, unified, and simple perspective for the Polyak stepsize and its variants as gradient descent on a surrogate loss. We show that each variant is equivalent to minimize a surrogate function with stepsizes that adapt to a guaranteed local curvature. Our general surrogate loss perspective is then used to provide a unified analysis of existing variants across different assumptions. Moreover, we show a number of negative results proving that the non-convergence results in some of the upper bounds is indeed real.

New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results

TL;DR

The paper reframes the Polyak stepsize as gradient descent on a surrogate loss , with adaptive updates driven by a local curvature measure. It proves that is always locally star-upper curved and inherits curvature from , enabling convergence analysis under minimal assumptions and yielding rates comparable to standard gradient methods when the local curvature is known. It generalizes to a broader surrogate family and extends to stochastic variants, showing SPS is adaptive across curvature regimes and revealing potential neighbourhood convergence and instability when surrogates are imperfect or in stochastic settings. It also provides negative results demonstrating intrinsic non-convergence phenomena in certain surrogate configurations, underscoring limits tied to the surrogate's minimum and curvature. Overall, the work offers a unified geometric perspective for understanding and designing Polyak-type methods and motivates surrogate design as a route to new algorithms.

Abstract

The Polyak stepsize has been proven to be a fundamental stepsize in convex optimization, giving near optimal gradient descent rates across a wide range of assumptions. The universality of the Polyak stepsize has also inspired many stochastic variants, with theoretical guarantees and strong empirical performance. Despite the many theoretical results, our understanding of the convergence properties and shortcomings of the Polyak stepsize or its variants is both incomplete and fractured across different analyses. We propose a new, unified, and simple perspective for the Polyak stepsize and its variants as gradient descent on a surrogate loss. We show that each variant is equivalent to minimize a surrogate function with stepsizes that adapt to a guaranteed local curvature. Our general surrogate loss perspective is then used to provide a unified analysis of existing variants across different assumptions. Moreover, we show a number of negative results proving that the non-convergence results in some of the upper bounds is indeed real.

Paper Structure

This paper contains 13 sections, 21 theorems, 74 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $f(x)$ be convex and define $\boldsymbol{x}^\star \in \mathop{\mathrm{argmin}}_{\boldsymbol{x}} \ f(\boldsymbol{x})$. Define $\phi(\boldsymbol{x})=\frac{1}{2} (f(\boldsymbol{x})-f(\boldsymbol{x}^\star))^2$. Then, we have

Figures (2)

  • Figure 1: The function $f(x)=|x+2|+\frac{x^2}{2}$ is non-smooth but is $2$-LSUC as demonstrated by the blue curve, $f(x^\star)-\langle \boldsymbol{g} ,x^\star-x\rangle -1/4\|\boldsymbol{g}\|^2$, being larger than $f(x)$ for all $x$ and $\boldsymbol{g} \in \partial f(x)$. Similarly, $f$ is self-bounded but with the larger constant $L=9$.
  • Figure 2: Trajectories under $T$\ref{['eq:det-gen-polyak']} for $h(x) = \frac{x^2}{2} + a$ with an unstable fixed point at $x^\star = 0$. Lack of convergence is observed for different values of $a$ as predicted by Proposition \ref{['thm:null-example']}.

Theorems & Definitions (49)

  • Definition 1
  • Definition 2
  • Definition 3: Local star upper curvature (LSUC)
  • Theorem 1: Curvature of the Polyak surrogate
  • Lemma 1
  • proof
  • Theorem 2
  • Definition 4: Approximate local-star-upper curvature
  • Lemma 2
  • proof
  • ...and 39 more