Table of Contents
Fetching ...

TASI Lectures on Physics for Machine Learning

Jim Halverson

TL;DR

This work surveys a field-theoretic view of neural networks organized around expressivity, statistics, and dynamics. It connects classic results such as the Universal Approximation Theorem and neural network Gaussian process limits to modern insights from neural tangent kernels and feature learning, culminating in a neural network field theory perspective that yields a potential bridge to quantum and statistical field theories. Key contributions include a clean derivation of the NNGP limit, analysis of non Gaussian corrections, a principled scaling framework for feature learning via maximal update parameterization, and a mapping between neural networks and interacting field theories including phi4. The practical significance lies in providing analytic control over learning dynamics, guiding principled architecture design, and offering a framework to import field-theoretic techniques into ML theory and physics alike.

Abstract

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.

TASI Lectures on Physics for Machine Learning

TL;DR

This work surveys a field-theoretic view of neural networks organized around expressivity, statistics, and dynamics. It connects classic results such as the Universal Approximation Theorem and neural network Gaussian process limits to modern insights from neural tangent kernels and feature learning, culminating in a neural network field theory perspective that yields a potential bridge to quantum and statistical field theories. Key contributions include a clean derivation of the NNGP limit, analysis of non Gaussian corrections, a principled scaling framework for feature learning via maximal update parameterization, and a mapping between neural networks and interacting field theories including phi4. The practical significance lies in providing analytic control over learning dynamics, guiding principled architecture design, and offering a framework to import field-theoretic techniques into ML theory and physics alike.

Abstract

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.
Paper Structure (19 sections, 2 theorems, 110 equations, 2 figures)

This paper contains 19 sections, 2 theorems, 110 equations, 2 figures.

Key Result

Theorem 2.1

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a continuous function on a compact set $K\subset \mathbb{R}^d$. Then for any $\epsilon>0$ there exists a neural network with a single hidden layer of the form $\theta = \{w_{ij}^{(0)}, w_i^{(1)}, b^{(0)}_i, b^{(1)}\}$, where $\sigma:\mathbb{R} \to \mathbb{R}$ is a non-polynomial non-linear activation function, such that

Figures (2)

  • Figure 1: The Universal Approximation Theorem can be understood by approximating a function like $\sin(x)$ with a series of bumps.
  • Figure 2: A comparison of MLP and KAN, including their functional form and the mathematical theorem motivating the architecture.

Theorems & Definitions (2)

  • Theorem 2.1: Cybenko
  • Theorem 2.2: Kolmogorov-Arnold Representation Theorem