TASI Lectures on Physics for Machine Learning

Jim Halverson

TASI Lectures on Physics for Machine Learning

Jim Halverson

TL;DR

This work surveys a field-theoretic view of neural networks organized around expressivity, statistics, and dynamics. It connects classic results such as the Universal Approximation Theorem and neural network Gaussian process limits to modern insights from neural tangent kernels and feature learning, culminating in a neural network field theory perspective that yields a potential bridge to quantum and statistical field theories. Key contributions include a clean derivation of the NNGP limit, analysis of non Gaussian corrections, a principled scaling framework for feature learning via maximal update parameterization, and a mapping between neural networks and interacting field theories including phi4. The practical significance lies in providing analytic control over learning dynamics, guiding principled architecture design, and offering a framework to import field-theoretic techniques into ML theory and physics alike.

Abstract

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.

TASI Lectures on Physics for Machine Learning

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 110 equations, 2 figures)

This paper contains 19 sections, 2 theorems, 110 equations, 2 figures.

Introduction
Expressivity of Neural Networks
Universal Approximation Theorem
Kolmogorov-Arnold Theorem
Statistics of Neural Networks
NNGP Correspondence
Non-Gaussian Processes
Symmetries
Examples
Dynamics of Neural Networks
Neural Tangent Kernel
An Exactly Solvable Model
Feature Learning
NN-FT Correspondence
Quantum Field Theory
...and 4 more sections

Key Result

Theorem 2.1

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a continuous function on a compact set $K\subset \mathbb{R}^d$. Then for any $\epsilon>0$ there exists a neural network with a single hidden layer of the form $\theta = \{w_{ij}^{(0)}, w_i^{(1)}, b^{(0)}_i, b^{(1)}\}$, where $\sigma:\mathbb{R} \to \mathbb{R}$ is a non-polynomial non-linear activation function, such that

Figures (2)

Figure 1: The Universal Approximation Theorem can be understood by approximating a function like $\sin(x)$ with a series of bumps.
Figure 2: A comparison of MLP and KAN, including their functional form and the mathematical theorem motivating the architecture.

Theorems & Definitions (2)

Theorem 2.1: Cybenko
Theorem 2.2: Kolmogorov-Arnold Representation Theorem

TASI Lectures on Physics for Machine Learning

TL;DR

Abstract

TASI Lectures on Physics for Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)