Convergence of continuous-time stochastic gradient descent with applications to deep neural networks

Gabor Lugosi; Eulalia Nualart

Convergence of continuous-time stochastic gradient descent with applications to deep neural networks

Gabor Lugosi, Eulalia Nualart

TL;DR

The paper investigates the convergence of a continuous-time stochastic gradient descent model, expressed as $\mathrm{d}w_t=-\nabla f(w_t)\,dt+\sqrt{\eta}\,\sigma(w_t)\,dB_t$, to global minima of the population loss $f(w)=\mathbb{E}[\ell(w,Z)]$ under stochastic noise. Building on Chatterjee's deterministic criteria, it develops an Itô-calculus framework with stopping times and Lyapunov-type quantities, yielding explicit conditions on $f$ and $\sigma$ (via $A_{\min}, G_{\max}, B_{\max}, \theta$) under which convergence occurs with positive probability and at an exponential rate when initialized near a global minimum. The results are specialized to overparameterized neural networks, showing that with smooth activations and bounded inputs one can satisfy the required inequalities, and that high-probability convergence to the global minimum set $\mathcal{S}$ can be achieved with sufficiently large final-layer weights and small noise. The work thus provides a rigorous link between continuous-time SGD dynamics, PL-type behavior, and NTK-inspired conditions in the population-risk setting, offering theoretical justification for efficient learning in deep networks under stochastic optimization.

Abstract

We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized neural network training.

Convergence of continuous-time stochastic gradient descent with applications to deep neural networks

TL;DR

The paper investigates the convergence of a continuous-time stochastic gradient descent model, expressed as

, to global minima of the population loss

under stochastic noise. Building on Chatterjee's deterministic criteria, it develops an Itô-calculus framework with stopping times and Lyapunov-type quantities, yielding explicit conditions on

and

(via

) under which convergence occurs with positive probability and at an exponential rate when initialized near a global minimum. The results are specialized to overparameterized neural networks, showing that with smooth activations and bounded inputs one can satisfy the required inequalities, and that high-probability convergence to the global minimum set

can be achieved with sufficiently large final-layer weights and small noise. The work thus provides a rigorous link between continuous-time SGD dynamics, PL-type behavior, and NTK-inspired conditions in the population-risk setting, offering theoretical justification for efficient learning in deep networks under stochastic optimization.

Abstract

Paper Structure (9 sections, 8 theorems, 109 equations)

This paper contains 9 sections, 8 theorems, 109 equations.

Introduction
Preliminaries and assumptions
Notation
Assumptions
Preliminaries on Itô's stochastic calculus
Convergence of the continuous-time sgd
Related literature
Application to deep neural networks
Proofs

Key Result

Lemma 3

Consider the sde(w2) initialized at some $w_0 \in \mathbb{R}^D$, and suppose that Assumptions a1 and a1B hold. If for some $t \in [0,T)$ we have $f(w_t)=0$, then $T=\infty$ and for all $s > t$, $w_s = w_t$.

Theorems & Definitions (17)

Lemma 3
Theorem 4: Multi-dimensional Itô formula
Remark 5
Lemma 6
Lemma 7
Lemma 8
Theorem 9
Remark 10
Remark 11
Remark 12
...and 7 more

Convergence of continuous-time stochastic gradient descent with applications to deep neural networks

TL;DR

Abstract

Convergence of continuous-time stochastic gradient descent with applications to deep neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)