The Parametric Complexity of Operator Learning

Samuel Lanthaler; Andrew M. Stuart

The Parametric Complexity of Operator Learning

Samuel Lanthaler, Andrew M. Stuart

TL;DR

A novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system, and can provably beat the curse of parametric complexity related to the infinite-dimensional input and output function spaces.

Abstract

Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a "curse of parametric complexity", which is an infinite-dimensional analogue of the well-known curse of dimensionality encountered in high-dimensional approximation problems. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO.The second contribution of the paper is to prove that this general curse can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of parametric complexity related to the infinite-dimensional input and output function spaces.

The Parametric Complexity of Operator Learning

TL;DR

Abstract

- or Lipschitz-regularity, operator learning suffers from a "curse of parametric complexity", which is an infinite-dimensional analogue of the well-known curse of dimensionality encountered in high-dimensional approximation problems. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO.The second contribution of the paper is to prove that this general curse can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of parametric complexity related to the infinite-dimensional input and output function spaces.

Paper Structure (51 sections, 30 theorems, 280 equations, 1 figure, 2 algorithms)

This paper contains 51 sections, 30 theorems, 280 equations, 1 figure, 2 algorithms.

Introduction
Context and Literature Review
Organization
The Curse of Parametric Complexity
Curse of Dimensionality for Neural Networks
ReLU Neural Networks
Two Simple Facts from ReLU Neural Network Calculus
Approximation Theory and CoD for ReLU Networks
Curse of Parametric Complexity in Operator Learning
Infinite-dimensional hypercubes
Curse of Parametric Complexity
Assume $K \subset \mathcal{X}$ contains a hypercube $Q_\alpha$.
Assume $\mathcal{Y} = \mathbb{R}$, i.e. $\mathcal{S}^\dagger$ is a functional.
Assume $\mathcal{S}$ is of neural network-type.
Main Theorem on Curse of Parametric Complexity
...and 36 more sections

Key Result

proposition 2.0

Let $r\in \mathbb{N}$ be given. For any dimension $D\in \mathbb{N}$, there exists $f_{D,r} \in C^r([0,1]^D;\mathbb{R})$ and constant $\overline{\epsilon},\gamma > 0$, such that any ReLU neural network $\Psi: \mathbb{R}^D \to \mathbb{R}$ achieving accuracy with $\epsilon \le \overline{\epsilon}$, has size at least $\mathrm{size}(\Psi) \ge \epsilon^{-\gamma D/r}$. The constant $\overline{\epsilon}

Figures (1)

Figure 1: Diagrammatic illustration of operator learning based on an encoding $\mathcal{E}$, a neural network $\Psi$, and a reconstruction $\mathcal{R}$.

Theorems & Definitions (79)

proposition 2.0: Neural Network CoD
example 2.1
definition 2.2
remark 2.3
remark 2.4
remark 2.5
lemma 2.6
definition 2.7: Functional of neural network-type
remark 2.8
definition 2.9: Operator of neural network-type
...and 69 more

The Parametric Complexity of Operator Learning

TL;DR

Abstract

The Parametric Complexity of Operator Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (79)