Les Houches Lectures on Deep Learning at Large & Infinite Width
Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon
TL;DR
The Les Houches lectures explore how deep neural networks behave in the large- and infinite-width limits, linking training dynamics to Gaussian processes and kernel methods via the Neural Tangent Kernel. The first part establishes GP priors and kernel recursions for various architectures, then shows how gradient-based training becomes kernel regression with a fixed NTK in the infinite-width limit. The subsequent lectures develop finite-width corrections through a BBGKY-like function-space hierarchy, revealing how the width-to-depth ratio L/n controls non-Gaussian fluctuations and feature learning, including the catapult phenomenon at large learning rates. A complementary line focuses on exact finite-width statistics for ReLU networks at a single input, predicting log-normal Jacobians and exponential NTK fluctuations governed by an inverse temperature β ≈ 5L/n. Collectively, the work provides a coherent framework connecting initialization, training dynamics, and finite-width effects, with implications for hyperparameter tuning and understanding when neural networks behave as kernel methods versus learning rich representations.
Abstract
These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural networks, linear models, kernels, and Gaussian processes that arise in the infinite-width limit; and perturbative and non-perturbative treatments of large but finite-width networks, at initialization and after training.
