Table of Contents
Fetching ...

Anytime Acceleration of Gradient Descent

Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen

TL;DR

This paper addresses the problem of accelerating gradient descent with anytime convergence guarantees for smooth convex optimization. It introduces a predetermined stepsize schedule, built via primitive stepsize concatenation of silver-step schedules, that ensures GD achieves $f(\bm{x}_T)-f^* = O\left(\frac{\lVert \bm{x}_1-\bm{x}^*\rVert^2}{T^{\vartheta}}\right)$ for all stopping times $T$, where $\vartheta = \frac{2\log_2\rho}{1+\log_2\rho} \approx 1.119$ and $\rho=1+\sqrt{2}$; it further shows an extension to smooth and strongly convex problems with exponential convergence $f(\bm{x}_T)-f^* = O\left(\exp(-\Omega(T/\kappa^{0.893}))\right)$, where $\kappa$ is the condition number. The approach hinges on concatenating primitive schedules with carefully chosen join steps to preserve progress while controlling gradient norms between joins. This work resolves a COLT open question about anytime acceleration for GD and provides a practical, knowledge-free acceleration mechanism for GD applicable to general stopping times. The results have implications for optimization tasks where the horizon is not known in advance and robust, anytime performance is essential.

Abstract

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of $O(T^{-1.119})$ for any stopping time $T$, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of $o(T^{-1})$. We further extend our theory to yield anytime convergence guarantees of $\exp(-Ω(T/κ^{0.893}))$ for smooth and strongly convex optimization, with $κ$ being the condition number.

Anytime Acceleration of Gradient Descent

TL;DR

This paper addresses the problem of accelerating gradient descent with anytime convergence guarantees for smooth convex optimization. It introduces a predetermined stepsize schedule, built via primitive stepsize concatenation of silver-step schedules, that ensures GD achieves for all stopping times , where and ; it further shows an extension to smooth and strongly convex problems with exponential convergence , where is the condition number. The approach hinges on concatenating primitive schedules with carefully chosen join steps to preserve progress while controlling gradient norms between joins. This work resolves a COLT open question about anytime acceleration for GD and provides a practical, knowledge-free acceleration mechanism for GD applicable to general stopping times. The results have implications for optimization tasks where the horizon is not known in advance and robust, anytime performance is essential.

Abstract

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of for any stopping time , where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT open problem \citep{kornowski2024open} regarding whether stepsize-based acceleration can yield anytime convergence rates of . We further extend our theory to yield anytime convergence guarantees of for smooth and strongly convex optimization, with being the condition number.

Paper Structure

This paper contains 21 sections, 10 theorems, 92 equations, 2 figures.

Key Result

Theorem 1

There exists a stepsize schedule $\{\alpha_t\}_{t=1}^{\infty}$, generated without knowing the stopping time, such that the gradient descent iterates eq:GD obeyThroughout this paper, we use $\|\cdot\|$ to denote the $\ell_2$ norm. for an arbitrary stopping time $T\geq 1$.

Figures (2)

  • Figure 1: Left: the first 128 steps of the silver stepsize schedule; Right: the first 128 steps of our stepsize schedule (with parameter $c$ adjusted for better illustration). The red bars indicate the positions of the join steps. The number of join steps in the first $t$ steps of the silver stepsize schedule is $\left\lfloor\log_2 t\right\rfloor$, whereas in our schedule, this number is roughly $\Omega(t^{\frac{\log_2\rho}{\log_2\rho+1}})$.
  • Figure 2: An illustration of our analysis strategy to bound $f_{\ell}-f^*$ for an intermediate step $\ell$. Here, the yellow point indicates the initial step, whereas the red points indicate the join steps. Here, $n_{\ell}$ indicates the largest join step below $\ell$.

Theorems & Definitions (18)

  • Theorem 1
  • Definition 2: Primitive stepsize schedule
  • Lemma 3
  • Lemma 4
  • proof
  • Definition 5: Silver stepsize schedule
  • Lemma 6
  • proof
  • Lemma 7
  • Lemma 8
  • ...and 8 more