Generalization of Silver Stepsize Schedule to Stochastic Optimization
Luwei Bai, Yang Zeng, Baoyu Zhou
TL;DR
<3-5 sentence high-level summary>: This paper extends the silver stepsize concept from deterministic optimization to stochastic settings by designing a two-step long stepsize schedule for stochastic gradient methods on smooth, strongly convex objectives with finite-support, unbiased gradient noise and bounded variance. It develops a tractable stochastic Performance Estimation Problem (PEP) framework, deriving a dual-feasible construction that yields explicit upper bounds showing the proposed schedule accelerates convergence relative to the classical constant stepsize 2/(M+m) when the initial optimality gap dominates noise. The authors prove the two-step schedule (α*,β*) exists and is unique for given (M,n,v), recovers the deterministic silver stepsize when n=1, and adapts to noise through the parameter v, balancing variance and progress. Numerical validation corroborates the theory, demonstrating improved performance in low-noise regimes and providing practical guidance on selecting v to achieve faster convergence. The work lays a foundation for extending to multi-step schedules and further exploration of stochastic acceleration via PEP-based analysis.
Abstract
This work introduces a two-step stepsize schedule for stochastic gradient methods minimizing smooth strongly convex functions. We consider the setting where only stochastic gradient approximations, which are unbiased, of bounded variance, and supported on a finite set, are accessible. When the variance bound is relatively smaller than a ratio of the initial optimality gap, the proposed stepsize schedule achieves better convergence performance compared to the well-regarded constant stepsize α = 2/(M+m), where m and M denote the strong convexity and gradient-Lipschitz parameters, respectively. Our stepsize schedule can be viewed as a generalization of the well-known two-step silver stepsize schedule in [J. M. Altschuler and P. A. Parrilo, Journal of the ACM, 72(2):1-38, 2025] from deterministic setting to stochastic optimization.
