Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
Phan-Minh Nguyen
TL;DR
The paper develops a mean-field limit for multilayer neural networks trained with SGD, showing that as the width grows, the learning dynamics converge to a nontrivial, width-independent regime. It introduces a MF formalism built on marginal uniformity and self-averaging, using stochastic kernels to represent intermediate-layer neurons and only their conditional expectations, and derives forward, backward, and evolution equations linking finite networks to deterministic MF dynamics. Theoretical statics results and extensive experiments on isotropic Gaussian tasks, MNIST, CIFAR-10, and CNN-like architectures validate the MF predictions, demonstrating width-invariant yet nontrivial learning behavior. This work extends established two-layer MF limits to deep architectures, offering a principled framework for analyzing and potentially guiding the design of wide multilayer networks.
Abstract
Can multilayer neural networks -- typically constructed as highly complex structures with many nonlinearly activated neurons across layers -- behave in a non-trivial way that yet simplifies away a major part of their complexities? In this work, we uncover a phenomenon in which the behavior of these complex networks -- under suitable scalings and stochastic gradient descent dynamics -- becomes independent of the number of neurons as this number grows sufficiently large. We develop a formalism in which this many-neurons limiting behavior is captured by a set of equations, thereby exposing a previously unknown operating regime of these networks. While the current pursuit is mathematically non-rigorous, it is complemented with several experiments that validate the existence of this behavior.
