Error analysis for stochastic gradient optimization schemes using modified equations
Charles-Edouard Bréhier, Marc Dambrine, Nassim En-Nebbazi
TL;DR
The paper addresses the convergence of stochastic gradient schemes for strongly convex objectives by linking the discrete updates to continuous-time modified equations. It develops two high-resolution descriptions: a first-order deterministic ODE and a second-order stochastic SDE with a modified objective $F^h=F+\frac{h}{4}\|\nabla F\|^2$, and proves uniform-in-time weak error bounds between the scheme and these continuous limits. The main contributions are Theorem 1 (uniform weak error of order $h$) and Theorem 2 (uniform weak error of order $h^2$ under stronger hypotheses), along with residual and strong error estimates and a complexity analysis that compares large-time vs. small-time-step behavior. The results are complemented by numerical experiments validating the sharpness of the bounds and providing guidance on when the higher-order modified equation yields computational benefits for long-time optimization tasks.
Abstract
We consider a class of stochastic gradient optimization schemes. Assuming that the objective function is strongly convex, we prove weak error estimates which are uniform in time for the error between the solution of the numerical scheme, and the solutions of continuous-time modified (or high-resolution) differential equations at first and second orders, with respect to the time-step size. At first order, the modified equation is deterministic, whereas at second order the modified equation is stochastic and depends on a modified objective function. We go beyond existing results where the error estimates have been considered only on finite time intervals and were not uniform in time. This allows us to then provide a rigorous complexity analysis of the method in the large time and small time-step size regimes. We provide numerical experiments to illustrate the convergence results.
