Nonlocality and Nonlinearity Implies Universality in Operator Learning

Samuel Lanthaler; Zongyi Li; Andrew M. Stuart

Nonlocality and Nonlinearity Implies Universality in Operator Learning

Samuel Lanthaler, Zongyi Li, Andrew M. Stuart

TL;DR

The paper shows that universal approximation for neural operators can be achieved with minimal nonlocality by incorporating a simple averaging operation. It introduces the Averaging Neural Operator (ANO) as a universal, low-complexity architecture that unifies many neural-operator frameworks (including FNO) as special cases under specific kernels and encodings. It provides universal approximation theorems in both Cs and Sobolev spaces and supports the theory with numerical experiments on Helmholtz, Darcy, and Kolmogorov-flow problems, illustrating the practical trade-offs between channel width and Fourier modes. The work clarifies the fundamental role of nonlocality in operator learning and offers a unifying perspective for designing universal neural-operator architectures on nonperiodic domains and across diverse function spaces.

Abstract

Neural operator architectures approximate operators between infinite-dimensional Banach spaces of functions. They are gaining increased attention in computational science and engineering, due to their potential both to accelerate traditional numerical methods and to enable data-driven discovery. As the field is in its infancy basic questions about minimal requirements for universal approximation remain open. It is clear that any general approximation of operators between spaces of functions must be both nonlocal and nonlinear. In this paper we describe how these two attributes may be combined in a simple way to deduce universal approximation. In so doing we unify the analysis of a wide range of neural operator architectures and open up consideration of new ones. A popular variant of neural operators is the Fourier neural operator (FNO). Previous analysis proving universal operator approximation theorems for FNOs resorts to use of an unbounded number of Fourier modes, relying on intuition from traditional analysis of spectral methods. The present work challenges this point of view: (i) the work reduces FNO to its core essence, resulting in a minimal architecture termed the ``averaging neural operator'' (ANO); and (ii) analysis of the ANO shows that even this minimal ANO architecture benefits from universal approximation. This result is obtained based on only a spatial average as its only nonlocal ingredient (corresponding to retaining only a \emph{single} Fourier mode in the special case of the FNO). The analysis paves the way for a more systematic exploration of nonlocality, both through the development of new operator learning architectures and the analysis of existing and new architectures. Numerical results are presented which give insight into complexity issues related to the roles of channel width (embedding dimension) and number of Fourier modes.

Nonlocality and Nonlinearity Implies Universality in Operator Learning

TL;DR

Abstract

Paper Structure (37 sections, 15 theorems, 111 equations, 2 figures)

This paper contains 37 sections, 15 theorems, 111 equations, 2 figures.

Introduction
Motivation and Literature Review
Neural Operator
Averaging Suffices for Universal Approximation
Nonlinearity and Nonlocality
Averaging Neural Operator: a Special Subclass of the NNO
Universal Approximation
Intuition
Encoder-Decoder Structure.
The Role of Positional Encodings.
Other Extensions and Variants.
Sketch Of The Proof Of Universality
Connection With Other Neural Operator Architectures
Neural Operator With General Integral Kernel
Low-Rank Neural Operator
...and 22 more sections

Key Result

theorem 2.1

Let $\Omega \subset \mathbb{R}^d$ be a bounded domain with Lipschitz boundary. For given integers $s,s'\ge 0$, let $\Psi^\dagger: C^{s}(\overline{\Omega}; \mathbb{R}^k) \to C^{s'}(\overline{\Omega}; \mathbb{R}^{k'})$ be a continuous operator, and fix a compact set $\mathsf{K}\subset C^s(\overline{\O

Figures (2)

Figure 1: FNO model with different combinations of channel dimension (width) $d_c$ and Fourier modes $K$. In each curve, the models share roughly the same amount of model parameters because $d_c K$ is fixed at $C$, with $C$ changing between each curve. As $C$ increases, the overall error drops. The curves at any given fixed $C$ exhibit a "U"-shape, where the valley determines the optimal choice of modes $K$. The black dotted line is the error given by simply truncating the truth at the given number $K$ of Fourier modes ("Fourier truncation"). Two things are notable: (i) for Helmholtz the optimal number of modes is fixed as computational budget increases, whilst for the two Darcy problems and Kolmogorov flow it grows; (ii) in all examples the trained FNO is able to considerably improve on the Fourier truncation, when smaller numbers of Fourier modes are used; this suggests the nonlinear approximation theoretic mechanisms at play and resulting from the architecture.
Figure 2: The effect of normalization demonstrated for the Helmholtz Equation example. (a) Error rates as shown in Figure \ref{['fig:exp']}a). (b) Same as Figure \ref{['fig:exp']}a) but with error rates of the Fourier truncation including the effect of normalization. (c) and (d) show the mean and standard deviation, respectively, as used in the normalization process. Normalization assists in capturing lower modes but it does not affect the higher modes.

Theorems & Definitions (28)

theorem 2.1
theorem 2.2
remark 2.3
remark 2.4
corollary 3.1: General Neural Operator
corollary 3.2: Low-Rank Neural Operator
corollary 3.3: Fourier Neural Operator
corollary 3.4: Wavelet Neural Operator
corollary 3.5: Laplace Neural Operator
lemma A.1: Periodic extension operator, see e.g. kovachki_universal_2021
...and 18 more

Nonlocality and Nonlinearity Implies Universality in Operator Learning

TL;DR

Abstract

Nonlocality and Nonlinearity Implies Universality in Operator Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)