Stochastic modified equations and adaptive stochastic gradient algorithms
Qianxiao Li, Cheng Tai, Weinan E
TL;DR
This work introduces stochastic modified equations (SME) to rigorously approximate SGD in the weak sense, enabling precise dynamical analysis of descent and noise-driven fluctuations beyond convex regimes. By combining SME with optimal control, it derives adaptive learning-rate and momentum policies, yielding robust algorithms (cSGD and cMSGD) that require less hyper-parameter tuning across diverse models and datasets. Theoretical results include first- and second-order weak approximations and stochastic asymptotic expansions, while empirical benchmarks on MNIST and CIFAR-10 validate competitive performance and adaptability. Overall, SME provides a general methodology for analyzing and designing stochastic gradient algorithms with practical, model-agnostic adaptivity.
Abstract
We develop the method of stochastic modified equations (SME), in which stochastic gradient algorithms are approximated in the weak sense by continuous-time stochastic differential equations. We exploit the continuous formulation together with optimal control theory to derive novel adaptive hyper-parameter adjustment policies. Our algorithms have competitive performance with the added benefit of being robust to varying models and datasets. This provides a general methodology for the analysis and design of stochastic gradient algorithms.
