Anytime Acceleration of Gradient Descent

Zihan Zhang; Jason D. Lee; Simon S. Du; Yuxin Chen

Anytime Acceleration of Gradient Descent

Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen

TL;DR

This paper addresses the problem of accelerating gradient descent with anytime convergence guarantees for smooth convex optimization. It introduces a predetermined stepsize schedule, built via primitive stepsize concatenation of silver-step schedules, that ensures GD achieves $f(\bm{x}_T)-f^* = O\left(\frac{\lVert \bm{x}_1-\bm{x}^*\rVert^2}{T^{\vartheta}}\right)$ for all stopping times $T$, where $\vartheta = \frac{2\log_2\rho}{1+\log_2\rho} \approx 1.119$ and $\rho=1+\sqrt{2}$; it further shows an extension to smooth and strongly convex problems with exponential convergence $f(\bm{x}_T)-f^* = O\left(\exp(-\Omega(T/\kappa^{0.893}))\right)$, where $\kappa$ is the condition number. The approach hinges on concatenating primitive schedules with carefully chosen join steps to preserve progress while controlling gradient norms between joins. This work resolves a COLT open question about anytime acceleration for GD and provides a practical, knowledge-free acceleration mechanism for GD applicable to general stopping times. The results have implications for optimization tasks where the horizon is not known in advance and robust, anytime performance is essential.

Abstract

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-Ω(T/κ^{0.893}))$ for smooth and strongly convex optimization, with $κ$ being the condition number.

Anytime Acceleration of Gradient Descent

TL;DR

for all stopping times

, where

and

; it further shows an extension to smooth and strongly convex problems with exponential convergence

, where

is the condition number. The approach hinges on concatenating primitive schedules with carefully chosen join steps to preserve progress while controlling gradient norms between joins. This work resolves a COLT open question about anytime acceleration for GD and provides a practical, knowledge-free acceleration mechanism for GD applicable to general stopping times. The results have implications for optimization tasks where the horizon is not known in advance and robust, anytime performance is essential.

Abstract

for any stopping time

, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of

. We further extend our theory to yield anytime convergence guarantees of

for smooth and strongly convex optimization, with

being the condition number.

Anytime Acceleration of Gradient Descent

TL;DR

Abstract

Anytime Acceleration of Gradient Descent

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (18)