Anytime Acceleration of Gradient Descent
Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen
TL;DR
This paper addresses the problem of accelerating gradient descent with anytime convergence guarantees for smooth convex optimization. It introduces a predetermined stepsize schedule, built via primitive stepsize concatenation of silver-step schedules, that ensures GD achieves $f(\bm{x}_T)-f^* = O\left(\frac{\lVert \bm{x}_1-\bm{x}^*\rVert^2}{T^{\vartheta}}\right)$ for all stopping times $T$, where $\vartheta = \frac{2\log_2\rho}{1+\log_2\rho} \approx 1.119$ and $\rho=1+\sqrt{2}$; it further shows an extension to smooth and strongly convex problems with exponential convergence $f(\bm{x}_T)-f^* = O\left(\exp(-\Omega(T/\kappa^{0.893}))\right)$, where $\kappa$ is the condition number. The approach hinges on concatenating primitive schedules with carefully chosen join steps to preserve progress while controlling gradient norms between joins. This work resolves a COLT open question about anytime acceleration for GD and provides a practical, knowledge-free acceleration mechanism for GD applicable to general stopping times. The results have implications for optimization tasks where the horizon is not known in advance and robust, anytime performance is essential.
Abstract
This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-Ω(T/κ^{0.893}))$ for smooth and strongly convex optimization, with $κ$ being the condition number.
