Table of Contents
Fetching ...

A mean-field limit for certain deep neural networks

Dyego Araújo, Roberto I. Oliveira, Daniel Yukimura

TL;DR

This work derives a mean-field scaling limit for deep neural networks trained by SGD, extending McKean–Vlasov descriptions from shallow to deep architectures with fully connected layers and fixed random features at the input/output. The authors develop a path-centric MV framework, prove existence and uniqueness of the limiting MV process, and show that SGD trajectories are well-approximated by continuous-time gradient flows coupled to ideal particles, with explicit finite-N error bounds. They introduce the notion of R-special measures to handle discontinuities in the drift and establish a contraction-based fixed-point argument to guarantee a unique MV solution whose marginal laws factor in a layered way. The results connect large-N training dynamics to a new family of MV PDEs for deep nets, provide quantitative convergence rates, and situate the work relative to prior shallow MV limits and adjacent deep-learning scaling theories. This advances theoretical understanding of how deep networks learn in the mean-field regime and suggests avenues for analyzing generalization and long-time behavior in complex architectures.

Abstract

Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being trained by stochastic gradient descent. Our networks have a fixed (but arbitrary) number $L\geq 2$ of inner layers; $N\gg 1$ neurons per layer; full connections between layers; and fixed weights (or "random features" that are not trained) near the input and output. Our results describe the evolution of the DNN during training in the limit when $N\to +\infty$, which we relate to a mean field model of McKean-Vlasov type. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by the mean-field model. A key part of the proof is to show existence and uniqueness for our McKean-Vlasov problem, which does not seem to be amenable to existing theory. Our paper extends previous work on the $L=1$ case by Mei, Montanari and Nguyen; Rotskoff and Vanden-Eijnden; and Sirignano and Spiliopoulos. We also complement recent independent work on $L>1$ by Sirignano and Spiliopoulos (who consider a less natural scaling limit) and Nguyen (who nonrigorously derives similar results).

A mean-field limit for certain deep neural networks

TL;DR

This work derives a mean-field scaling limit for deep neural networks trained by SGD, extending McKean–Vlasov descriptions from shallow to deep architectures with fully connected layers and fixed random features at the input/output. The authors develop a path-centric MV framework, prove existence and uniqueness of the limiting MV process, and show that SGD trajectories are well-approximated by continuous-time gradient flows coupled to ideal particles, with explicit finite-N error bounds. They introduce the notion of R-special measures to handle discontinuities in the drift and establish a contraction-based fixed-point argument to guarantee a unique MV solution whose marginal laws factor in a layered way. The results connect large-N training dynamics to a new family of MV PDEs for deep nets, provide quantitative convergence rates, and situate the work relative to prior shallow MV limits and adjacent deep-learning scaling theories. This advances theoretical understanding of how deep networks learn in the mean-field regime and suggests avenues for analyzing generalization and long-time behavior in complex architectures.

Abstract

Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being trained by stochastic gradient descent. Our networks have a fixed (but arbitrary) number of inner layers; neurons per layer; full connections between layers; and fixed weights (or "random features" that are not trained) near the input and output. Our results describe the evolution of the DNN during training in the limit when , which we relate to a mean field model of McKean-Vlasov type. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by the mean-field model. A key part of the proof is to show existence and uniqueness for our McKean-Vlasov problem, which does not seem to be amenable to existing theory. Our paper extends previous work on the case by Mei, Montanari and Nguyen; Rotskoff and Vanden-Eijnden; and Sirignano and Spiliopoulos. We also complement recent independent work on by Sirignano and Spiliopoulos (who consider a less natural scaling limit) and Nguyen (who nonrigorously derives similar results).

Paper Structure

This paper contains 55 sections, 38 theorems, 321 equations, 2 figures.

Key Result

Theorem 5.3

Under the assumptions in Section sub:assumptions, the McKean-Vlasov problem in Section sub:McKeanVlasov has a unique solution $\mu^{\star}_{[0,T]}$. This solution has the following structure:

Figures (2)

  • Figure 3.1: Representation of the Parameters over the Deep Neural Network with L+1 Layers.
  • Figure 4.1: Visualization of a path

Theorems & Definitions (96)

  • Definition 3.1: Parametric Deep Neural Network
  • Remark 3.2: Averages
  • Remark 3.3: Expected behavior
  • Remark 3.4: Time scales
  • Remark 3.5: Averaging the increment
  • Remark 3.6: Random features
  • Definition 4.1: Marginals
  • Definition 4.2: Measures on trajectories
  • Remark 4.3
  • Definition 4.4: McKean-Vlasov problem for DNN
  • ...and 86 more