Les Houches Lectures on Deep Learning at Large & Infinite Width

Yasaman Bahri; Boris Hanin; Antonin Brossollet; Vittorio Erba; Christian Keup; Rosalba Pacelli; James B. Simon

Les Houches Lectures on Deep Learning at Large & Infinite Width

Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

TL;DR

The Les Houches lectures explore how deep neural networks behave in the large- and infinite-width limits, linking training dynamics to Gaussian processes and kernel methods via the Neural Tangent Kernel. The first part establishes GP priors and kernel recursions for various architectures, then shows how gradient-based training becomes kernel regression with a fixed NTK in the infinite-width limit. The subsequent lectures develop finite-width corrections through a BBGKY-like function-space hierarchy, revealing how the width-to-depth ratio L/n controls non-Gaussian fluctuations and feature learning, including the catapult phenomenon at large learning rates. A complementary line focuses on exact finite-width statistics for ReLU networks at a single input, predicting log-normal Jacobians and exponential NTK fluctuations governed by an inverse temperature β ≈ 5L/n. Collectively, the work provides a coherent framework connecting initialization, training dynamics, and finite-width effects, with implications for hyperparameter tuning and understanding when neural networks behave as kernel methods versus learning rich representations.

Abstract

These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural networks, linear models, kernels, and Gaussian processes that arise in the infinite-width limit; and perturbative and non-perturbative treatments of large but finite-width networks, at initialization and after training.

Les Houches Lectures on Deep Learning at Large & Infinite Width

TL;DR

Abstract

Paper Structure (58 sections, 6 theorems, 215 equations, 2 figures)

This paper contains 58 sections, 6 theorems, 215 equations, 2 figures.

Lecture 1: Yasaman Bahri
Introduction
Setup
Prior in function space
Prior in function space for deep fully-connected architectures
Prior in function space for more complex architectures
Bayesian inference for Gaussian processes
Large-depth fixed points of Neural Network Gaussian Process (NNGP) kernel recursion
Lecture 2
Introduction
Wick's theorem
Two-point correlation function
Four-point correlation function
Large-$n$ expansion
Gradient descent dynamics of optimization in the infinite-width limit
...and 43 more sections

Key Result

Theorem 4.1

Fix $L,n_0,n_{L+1},\sigma$. Suppose that at the start of training we initialize as in E:init.

Figures (2)

Figure 1: Phase diagram in the $(\sigma_b^2, \sigma_w^2)$ plane for fixed points of the NNGP recursion relationship with nonlinearity $\phi = \tanh$, showing ordered and chaotic phases separated by a critical line. Figure reproduced from bahri2020; see also schoenholz2017.
Figure 2: Partial Phase Diagram for Fully Connected Networks with NTK Initialization

Theorems & Definitions (11)

Definition 1: Gaussian process
proof : Proof (informal).
Theorem 4.1: GP + NTK Regime for Networks at Fixed Depth and Infinite Width
Theorem 4.2
Proposition 4.3
Lemma 4.4
proof
Theorem 5.1: Meta-Claim
Proposition 5.2: Exact Matrix Model Underlying Random ReLU Networks
proof : Sketch of Proof
...and 1 more

Les Houches Lectures on Deep Learning at Large & Infinite Width

TL;DR

Abstract

Les Houches Lectures on Deep Learning at Large & Infinite Width

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (11)