Table of Contents
Fetching ...

Error analysis for stochastic gradient optimization schemes using modified equations

Charles-Edouard Bréhier, Marc Dambrine, Nassim En-Nebbazi

TL;DR

The paper addresses the convergence of stochastic gradient schemes for strongly convex objectives by linking the discrete updates to continuous-time modified equations. It develops two high-resolution descriptions: a first-order deterministic ODE and a second-order stochastic SDE with a modified objective $F^h=F+\frac{h}{4}\|\nabla F\|^2$, and proves uniform-in-time weak error bounds between the scheme and these continuous limits. The main contributions are Theorem 1 (uniform weak error of order $h$) and Theorem 2 (uniform weak error of order $h^2$ under stronger hypotheses), along with residual and strong error estimates and a complexity analysis that compares large-time vs. small-time-step behavior. The results are complemented by numerical experiments validating the sharpness of the bounds and providing guidance on when the higher-order modified equation yields computational benefits for long-time optimization tasks.

Abstract

We consider a class of stochastic gradient optimization schemes. Assuming that the objective function is strongly convex, we prove weak error estimates which are uniform in time for the error between the solution of the numerical scheme, and the solutions of continuous-time modified (or high-resolution) differential equations at first and second orders, with respect to the time-step size. At first order, the modified equation is deterministic, whereas at second order the modified equation is stochastic and depends on a modified objective function. We go beyond existing results where the error estimates have been considered only on finite time intervals and were not uniform in time. This allows us to then provide a rigorous complexity analysis of the method in the large time and small time-step size regimes. We provide numerical experiments to illustrate the convergence results.

Error analysis for stochastic gradient optimization schemes using modified equations

TL;DR

The paper addresses the convergence of stochastic gradient schemes for strongly convex objectives by linking the discrete updates to continuous-time modified equations. It develops two high-resolution descriptions: a first-order deterministic ODE and a second-order stochastic SDE with a modified objective , and proves uniform-in-time weak error bounds between the scheme and these continuous limits. The main contributions are Theorem 1 (uniform weak error of order ) and Theorem 2 (uniform weak error of order under stronger hypotheses), along with residual and strong error estimates and a complexity analysis that compares large-time vs. small-time-step behavior. The results are complemented by numerical experiments validating the sharpness of the bounds and providing guidance on when the higher-order modified equation yields computational benefits for long-time optimization tasks.

Abstract

We consider a class of stochastic gradient optimization schemes. Assuming that the objective function is strongly convex, we prove weak error estimates which are uniform in time for the error between the solution of the numerical scheme, and the solutions of continuous-time modified (or high-resolution) differential equations at first and second orders, with respect to the time-step size. At first order, the modified equation is deterministic, whereas at second order the modified equation is stochastic and depends on a modified objective function. We go beyond existing results where the error estimates have been considered only on finite time intervals and were not uniform in time. This allows us to then provide a rigorous complexity analysis of the method in the large time and small time-step size regimes. We provide numerical experiments to illustrate the convergence results.

Paper Structure

This paper contains 30 sections, 11 theorems, 240 equations, 5 figures.

Key Result

Theorem 1

Assume that the objective function $F$ and that the diffusion coefficient $\sigma$ satisfy the basic Assumption ass:F1 and the basic Assumption ass:sigma1 respectively. There exist positive real numbers $H\in(0,h_{\max})$ and $C\in(0,\infty)$ such that, for any initial value $x_0\in\mathbb{R}^d$, fo

Figures (5)

  • Figure 1: Illustration of strong error estimates for the quadratic objective function $F$, and with $\sigma(t)=e^{-at}$ for $a=1.5$ (top left), $a=1.0$ (top right), $a=0.5$ (bottom left) and $a=0.1$ (bottom right). The values of the final time are $T\in\{2,4,8,16,32\}$.
  • Figure 2: Illustration of strong error estimates for the quadratic objective function $F$, and with $\sigma(t)=\frac{1}{1+t^a}$ for $a=2.0$ (top left), $a=1.5$ (top right), $a=1.0$ (bottom left) and $a=0.5$ (bottom right). The values of the final time are $T\in\{2,4,8,16,32\}$.
  • Figure 3: Illustration of strong error estimates for the non-quadratic objective function $F_\epsilon$, for different values of $\epsilon$, and with $\sigma(t)=e^{-t}$. The values of the final time are $T=8$ (left), $T=16$ (right) and $T=32$ (bottom).
  • Figure 4: Illustration of strong error estimates for the non-quadratic objective function $F_\epsilon$, for different values of $\epsilon$, and with $\sigma(t)=\frac{1}{1+t}$. The values of the final time are $T=8$ (left), $T=16$ (right) and $T=32$ (bottom).
  • Figure 5: Illustration of strong error estimates for the non-quadratic objective function $F_\epsilon$, for different values of $T$. Left: $\sigma(t)=e^{-t}$. Right: $\sigma(t)=\frac{1}{1+t}$. Top: $\epsilon=10^{-1}$. Bottom: $\epsilon=10^{-6}$.

Theorems & Definitions (32)

  • Example 2.1
  • Example 2.2
  • Remark 2.1
  • Remark 2.2
  • Theorem 1
  • Theorem 2
  • Remark 3.1
  • Lemma 4.1
  • Remark 4.1
  • proof
  • ...and 22 more