Step-Size Stability in Stochastic Optimization: A Theoretical Perspective
Fabian Schaipp, Robert M. Gower, Adrien Taylor
TL;DR
The paper addresses how stochastic optimization methods behave as the step size grows, introducing a stability index $\delta_t$ that quantifies suboptimality growth with $\alpha$. It develops model-based analyses for SGD, SPS, NGN, and SPP, deriving explicit forms of $\delta_t$ and proving that adaptive methods yield $\delta_t$ no larger than SGD, often scaling more favorably as $\alpha$ increases. This yields new convex/non-smooth convergence insights and explains empirically observed robustness of SPS/NGN/SPP beyond traditional SGD tuning. Experimental results on nonconvex deep learning and convex regression show the theory qualitatively tracks actual performance, validating the practical relevance of the stability framework. The work suggests that monitoring $\delta_t$ could inform early stopping and motivates extending the approach to momentum-based methods and broader problem classes.
Abstract
We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides direct theoretical evidence that adaptive step-size methods, such as SPS or NGN, are more robust than SGD. This allows us to quantify the advantage of these adaptive methods beyond empirical evaluation. Finally, we show through experiments that our theoretical bound qualitatively mirrors the actual performance as a function of the step size, even for nonconvex problems.
