Revisiting the Polyak step size
Elad Hazan, Sham Kakade
TL;DR
The paper addresses parameter-free optimization by showing that a simple Polyak step size $η_t = h_t / ||∇ f(x_t)||^2$, with $h_t = f(x_t) - f(x^*)$, achieves near-optimal convergence rates for gradient descent across all standard regimes (general convex, β-smooth, α-strongly convex, and β-smooth/α-strongly convex) without prior knowledge of problem constants. It also introduces an adaptive variant that requires only a lower bound $\tilde{f}_0 ≤ f(x^*)$ and refines this bound as needed, maintaining essentially the same performance with a logarithmic overhead in gradient updates. The key contributions are the unified analysis showing optimality of the exact Polyak step size in multiple regimes, and a practical adaptive scheme that eliminates the need to know $f(x^*)$ a priori. These results offer a parameter-free, scalable approach to gradient-based optimization with clear theoretical guarantees.
Abstract
This paper revisits the Polyak step size schedule for convex optimization problems, proving that a simple variant of it simultaneously attains near optimal convergence rates for the gradient descent algorithm, for all ranges of strong convexity, smoothness, and Lipschitz parameters, without a-priory knowledge of these parameters.
