Randomised Splitting Methods and Stochastic Gradient Descent
Luke Shaw, Peter A. Whalley
TL;DR
The paper reframes stochastic gradient optimisation with minibatching as a splitting-method problem from ODE theory, revealing why momentum and symmetry-enhanced minibatching improve long-run accuracy. By introducing Symmetric Minibatch Splitting (SMS) and interpreting RR and SMS through Strang-like splittings, it proves that momentum combined with SMS achieves stochastic gradient bias that scales as $O(h^4)$ in mean-square error, outperforming existing strategies. A comprehensive backward-error (Lie derivative and BCH) analysis, augmented by Lyapunov convergence guarantees, shows improved bias and convergence rates for SMS with momentum across strongly convex, smooth objectives, with analytical model problems supporting the rates. The numerical experiments on logistic regression validate the theory and demonstrate practical gains, suggesting that SMS with momentum offers a robust, low-cost route to faster convergence in finite-sum optimisation. The work thus provides a principled, mathematically grounded blueprint for designing minibatching strategies that leverage symmetry and momentum to tighten stochastic bias and speed up optimisation in ML contexts.
Abstract
We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric Minibatching Strategy) for stochastic gradient optimisation which shows greatly reduced stochastic gradient bias (from $\mathcal{O}(h^2)$ to $\mathcal{O}(h^4)$ in the optimiser stepsize $h$), when combined with momentum-based optimisers. We justify why momentum is needed to obtain the improved performance using the theory of backward analysis for splitting integrators and provide a detailed analytic computation of the stochastic gradient bias on a simple example. Further, we provide improved convergence guarantees for this new minibatching strategy using Lyapunov techniques that show reduced stochastic gradient bias for a fixed stepsize (or learning rate) over the class of strongly-convex and smooth objective functions. Via the same techniques we also improve the known results for the Random Reshuffling strategy for stochastic gradient descent methods with momentum. We argue that this also leads to a faster convergence rate when considering a decreasing stepsize schedule. Both the reduced bias and efficacy of decreasing stepsizes are demonstrated numerically on several motivating examples.
