Stochastic modified equations and adaptive stochastic gradient algorithms

Qianxiao Li; Cheng Tai; Weinan E

Stochastic modified equations and adaptive stochastic gradient algorithms

Qianxiao Li, Cheng Tai, Weinan E

TL;DR

This work introduces stochastic modified equations (SME) to rigorously approximate SGD in the weak sense, enabling precise dynamical analysis of descent and noise-driven fluctuations beyond convex regimes. By combining SME with optimal control, it derives adaptive learning-rate and momentum policies, yielding robust algorithms (cSGD and cMSGD) that require less hyper-parameter tuning across diverse models and datasets. Theoretical results include first- and second-order weak approximations and stochastic asymptotic expansions, while empirical benchmarks on MNIST and CIFAR-10 validate competitive performance and adaptability. Overall, SME provides a general methodology for analyzing and designing stochastic gradient algorithms with practical, model-agnostic adaptivity.

Abstract

We develop the method of stochastic modified equations (SME), in which stochastic gradient algorithms are approximated in the weak sense by continuous-time stochastic differential equations. We exploit the continuous formulation together with optimal control theory to derive novel adaptive hyper-parameter adjustment policies. Our algorithms have competitive performance with the added benefit of being robust to varying models and datasets. This provides a general methodology for the analysis and design of stochastic gradient algorithms.

Stochastic modified equations and adaptive stochastic gradient algorithms

TL;DR

Abstract

Stochastic modified equations and adaptive stochastic gradient algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (14)