Table of Contents
Fetching ...

Generalization of Silver Stepsize Schedule to Stochastic Optimization

Luwei Bai, Yang Zeng, Baoyu Zhou

TL;DR

<3-5 sentence high-level summary>: This paper extends the silver stepsize concept from deterministic optimization to stochastic settings by designing a two-step long stepsize schedule for stochastic gradient methods on smooth, strongly convex objectives with finite-support, unbiased gradient noise and bounded variance. It develops a tractable stochastic Performance Estimation Problem (PEP) framework, deriving a dual-feasible construction that yields explicit upper bounds showing the proposed schedule accelerates convergence relative to the classical constant stepsize 2/(M+m) when the initial optimality gap dominates noise. The authors prove the two-step schedule (α*,β*) exists and is unique for given (M,n,v), recovers the deterministic silver stepsize when n=1, and adapts to noise through the parameter v, balancing variance and progress. Numerical validation corroborates the theory, demonstrating improved performance in low-noise regimes and providing practical guidance on selecting v to achieve faster convergence. The work lays a foundation for extending to multi-step schedules and further exploration of stochastic acceleration via PEP-based analysis.

Abstract

This work introduces a two-step stepsize schedule for stochastic gradient methods minimizing smooth strongly convex functions. We consider the setting where only stochastic gradient approximations, which are unbiased, of bounded variance, and supported on a finite set, are accessible. When the variance bound is relatively smaller than a ratio of the initial optimality gap, the proposed stepsize schedule achieves better convergence performance compared to the well-regarded constant stepsize α = 2/(M+m), where m and M denote the strong convexity and gradient-Lipschitz parameters, respectively. Our stepsize schedule can be viewed as a generalization of the well-known two-step silver stepsize schedule in [J. M. Altschuler and P. A. Parrilo, Journal of the ACM, 72(2):1-38, 2025] from deterministic setting to stochastic optimization.

Generalization of Silver Stepsize Schedule to Stochastic Optimization

TL;DR

<3-5 sentence high-level summary>: This paper extends the silver stepsize concept from deterministic optimization to stochastic settings by designing a two-step long stepsize schedule for stochastic gradient methods on smooth, strongly convex objectives with finite-support, unbiased gradient noise and bounded variance. It develops a tractable stochastic Performance Estimation Problem (PEP) framework, deriving a dual-feasible construction that yields explicit upper bounds showing the proposed schedule accelerates convergence relative to the classical constant stepsize 2/(M+m) when the initial optimality gap dominates noise. The authors prove the two-step schedule (α*,β*) exists and is unique for given (M,n,v), recovers the deterministic silver stepsize when n=1, and adapts to noise through the parameter v, balancing variance and progress. Numerical validation corroborates the theory, demonstrating improved performance in low-noise regimes and providing practical guidance on selecting v to achieve faster convergence. The work lays a foundation for extending to multi-step schedules and further exploration of stochastic acceleration via PEP-based analysis.

Abstract

This work introduces a two-step stepsize schedule for stochastic gradient methods minimizing smooth strongly convex functions. We consider the setting where only stochastic gradient approximations, which are unbiased, of bounded variance, and supported on a finite set, are accessible. When the variance bound is relatively smaller than a ratio of the initial optimality gap, the proposed stepsize schedule achieves better convergence performance compared to the well-regarded constant stepsize α = 2/(M+m), where m and M denote the strong convexity and gradient-Lipschitz parameters, respectively. Our stepsize schedule can be viewed as a generalization of the well-known two-step silver stepsize schedule in [J. M. Altschuler and P. A. Parrilo, Journal of the ACM, 72(2):1-38, 2025] from deterministic setting to stochastic optimization.

Paper Structure

This paper contains 17 sections, 6 theorems, 109 equations, 5 figures.

Key Result

Proposition 2.2

(See taylor2017smooth.) Let's consider $0 < m < M < +\infty$ and a finite index set $\mathcal{K}$. Then $\{(x_k,\nabla f(x_k),f(x_k))\}_{k\in\mathcal{K}}$ is $\mathcal{F}_{m,M}$-interpolable if and only if for any $i\in\mathcal{K}$ and $j\in\mathcal{K}$, it holds that

Figures (5)

  • Figure 1: Two-step stepsize schedule $(\alpha^*, \beta^*)$ versus the parameter $v$ for representative values of $M$ and $n$.
  • Figure 2: Dependence of $\mu(v)$, $\tau(v)$, and $h(v,\tfrac{\sigma}{R})$ on $v$ when $(M,n) = (2,2)$.
  • Figure 3: Illustration of the quantities $\mathscr{C}$ and $\sqrt{\mathscr{U}(M,n)}$ defined in Theorem \ref{['theo:lowerbound']}.
  • Figure 4: Comparison of $h(v, \frac{\sigma}{R})$ and $\frac{h_{\mathrm{constant}}(R, \sigma)}{R^2}$. Each column corresponds to different relative noise levels such as $\frac{\sigma}{R} = 0.1$, $0.01$, $0.001$, and $0.0001$ from left to right. In each plot, the blue curve denotes $h(v, \frac{\sigma}{R})$ as a function of $v\in\left[0, \tfrac{(M-1)n}{(n-1)M}\right]$, while the red dashed lines represents $\frac{h_{\mathrm{constant}}(R, \sigma)}{R^2}$, the rescaled optimal objective value of PEP \ref{['pep:primal_prob']} under the constant stepsize $\tfrac{2}{M+1}$.
  • Figure 5: Structured $\Delta\in\mathbb{R}_{(1+n+n^2)\times(1+n+n^2)}$ matrix.

Theorems & Definitions (18)

  • Remark 2.1
  • Proposition 2.2
  • Remark 2.3
  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Remark 3.3
  • Theorem 3.4
  • proof
  • ...and 8 more