A Mathematical Guide to Operator Learning

Nicolas Boullé; Alex Townsend

A Mathematical Guide to Operator Learning

Nicolas Boullé, Alex Townsend

TL;DR

This survey frames neural operator learning as learning the action of a (potentially nonlinear) operator between function spaces, unifying approaches like DeepONet, Fourier neural operators, Green-based learning, and graph-based operators under a common lens linked to numerical linear algebra. It details how discretization turns operators into structured matrices (low-rank, circulant, banded, hierarchical) and how these structures guide architecture choices and efficiency, including multiscale MGNOs and spectral methods. The authors discuss data-generation strategies via Gaussian-process source terms, solver choices (FEM/FDM/spectral), and practical optimization considerations (losses, optimizers, convergence), while highlighting zero-shot super-resolution and data-efficiency findings. They also outline challenges and directions—software, theory, physical properties, and real-world deployments—emphasizing interpretability and the discovery of unknown physics through operator learning.

Abstract

Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.

A Mathematical Guide to Operator Learning

TL;DR

Abstract

Paper Structure (34 sections, 44 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 34 sections, 44 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
What is a neural operator?
Where is operator learning relevant?
Speeding up numerical PDE solvers.
Parameter optimization.
Benchmarking new techniques.
Discovering unknown physics.
Organization of the paper
From numerical linear algebra to operator learning
Low rank matrix recovery
Circulant matrix recovery
Banded matrix recovery
Hierarchical low rank matrix recovery
Neural operator architectures
Deep operator networks
...and 19 more sections

Figures (11)

Figure 1: Illustrating the role of operator learning in SciML. Operator learning aims to discover or approximate an unknown operator $\mathcal{A}$, which often corresponds to the solution operator of an unknown PDE. In contrast, PDE discovery aims to discover coefficients of the PDE itself, while PDE solvers aim to solve a known PDE using ML techniques.
Figure 2: (a) A generic $12\times 12$ banded matrix with bandwidth $2$, with a maximum of $5$ diagonals, and the corresponding graph (b). Here, each vertex is a column of the banded matrix, and two vertices are connected if their corresponding columns do not have disjoint support. The coloring number of $5$ determines the minimum number of matrix-vector products needed to recover the structure. Generally, an $N\times N$ banded matrix with bandwidth $w$ can be recovered in $2w+1$ matrix-vector products.
Figure 3: (a) A HODLR matrix $H_{N,k}$ after three levels of partitioning. Since $H_{N,k}$ is a rank-$k$ HODLR matrix, $U_i$, $V_i$, $W_i$, and $Z_i$ have at most $k$ columns. The matrices $A_{ii}$ are themselves rank-$k$ HODLR matrices of size $N/8\times N/8$ and can be further partitioned. (b) Graph corresponding to a hierarchical low-rank matrix with three levels. Here, each vertex is a low-rank block of the matrix, where two vertices are connected if their low-rank blocks occupy the same row. At each level, the number of required matrix-vector input probes to recover that level is proportional to the coloring number of the graph when restricted to submatrices of the same size. In this case, the submatrices that are identically colored can be recovered simultaneously.
Figure 4: Schematic diagram of a deep operator network (DeepONet). A DeepONet parametrizes a neural operator using a branch network and a truncation (trunk) network. The branch network encodes the input function $f$ as a vector of $p$ features, which is then multiplied by the trunk network to yield a rank-$p$ representation of the solution $u$.
Figure 5: Schematic diagram of a Fourier neural operator (FNO). The networks P and Q, respectively, lift the input function $f$ to a higher dimensional space and project the output of the last Fourier layer to the output dimension. An FNO mainly consists of a succession of Fourier layers, which perform the integral operations in neural operators as a convolution in the Fourier domain and component-wise composition with an activation function $\sigma$.
...and 6 more figures

A Mathematical Guide to Operator Learning

TL;DR

Abstract

A Mathematical Guide to Operator Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)