Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule
Jason M. Altschuler, Pablo A. Parrilo
TL;DR
The paper demonstrates that gradient descent can be accelerated without momentum by carefully designing time-varying, non-monotone stepsizes, introducing the Silver Stepsize Schedule. This schedule yields a convergence rate that interpolates between the textbook unaccelerated rate and Nesterov-style acceleration, with a phase transition at a horizon $n^* = \Theta(\kappa^{\log_{\rho} 2})$ and an overall iteration complexity of $n = \Theta(\kappa^{\log_{\rho} 2} \log(1/\varepsilon))$ to reach accuracy $\varepsilon$ for $\kappa$-conditioned functions; the rate is shown to be partially optimal and is proven via multi-step descent, recursive certificates, and hedging arguments. The approach relies on a fully explicit recursive construction of the stepsize schedule, a fractal-like structure, and a rigorous certificate using co-coercivity and interpolation tools. The results extend to non-strongly convex settings via black-box reductions and suggest new directions for acceleration without altering the GD framework. Overall, this work challenges the long-held belief that acceleration requires momentum, by showing that dynamic, well-structured stepsizes can achieve substantially faster convergence in smooth convex optimization.
Abstract
Can we accelerate convergence of gradient descent without changing the algorithm -- just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{\log_ρ 2} \approx k^{0.7864}$ iterations, where $ρ=1+\sqrt{2}$ is the silver ratio and $k$ is the condition number. This is intermediate between the textbook unaccelerated rate $k$ and the accelerated rate $\sqrt{k}$ due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate $\varepsilon^{-\log_ρ 2} \approx \varepsilon^{-0.7864}$. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period $k^{\log_ρ 2}$. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).
