Table of Contents
Fetching ...

Optimal Convergence Rates for Neural Operators

Mike Nguyen, Nicole Mücke

TL;DR

The paper tackles operator learning between function spaces by placing two layer neural operators in the neural tangent kernel regime and deriving generalization guarantees for early stopped gradient descent. It develops a vector valued kernel framework the vvNTK, proving that the vvRKHS can approximate neural operator targets and establishing minimax optimal rates under a Hölder source condition and eigenvalue decay; the resulting bounds specify how many hidden units $M$ and second stage samples $n_{\mathcal X}$ are needed to achieve a target accuracy, with a rate $||G_{\theta_T}-G^*||_{L^2_{\mu_u}} = \tilde{O}(T^{-r}+M^{-1/2}+n_{\mathcal X}^{-1/2})$ and a choice of stopping time $T$ yielding the minimax rate $O(n_{\mathcal U}^{-r/(2r+b)})$. The analysis relies on a decomposition into a Taylor approximation error, a random feature type error, and a generalization error, and includes a weight stability result ensuring the network stays near initialization. Empirical validation on the Poisson equation confirms the theoretical scaling, showing that width and sample requirements scale as $\sqrt{n_{\mathcal U}}$ to realize the optimum rates. Overall, the work provides a principled NTK based theory for fast convergent and sample-efficient neural operator learning with practical PDE surrogate applications.

Abstract

We introduce the neural tangent kernel (NTK) regime for two-layer neural operators and analyze their generalization properties. For early-stopped gradient descent (GD), we derive fast convergence rates that are known to be minimax optimal within the framework of non-parametric regression in reproducing kernel Hilbert spaces (RKHS). We provide bounds on the number of hidden neurons and the number of second-stage samples necessary for generalization. To justify our NTK regime, we additionally show that any operator approximable by a neural operator can also be approximated by an operator from the RKHS. A key application of neural operators is learning surrogate maps for the solution operators of partial differential equations (PDEs). We consider the standard Poisson equation to illustrate our theoretical findings with simulations.

Optimal Convergence Rates for Neural Operators

TL;DR

The paper tackles operator learning between function spaces by placing two layer neural operators in the neural tangent kernel regime and deriving generalization guarantees for early stopped gradient descent. It develops a vector valued kernel framework the vvNTK, proving that the vvRKHS can approximate neural operator targets and establishing minimax optimal rates under a Hölder source condition and eigenvalue decay; the resulting bounds specify how many hidden units and second stage samples are needed to achieve a target accuracy, with a rate and a choice of stopping time yielding the minimax rate . The analysis relies on a decomposition into a Taylor approximation error, a random feature type error, and a generalization error, and includes a weight stability result ensuring the network stays near initialization. Empirical validation on the Poisson equation confirms the theoretical scaling, showing that width and sample requirements scale as to realize the optimum rates. Overall, the work provides a principled NTK based theory for fast convergent and sample-efficient neural operator learning with practical PDE surrogate applications.

Abstract

We introduce the neural tangent kernel (NTK) regime for two-layer neural operators and analyze their generalization properties. For early-stopped gradient descent (GD), we derive fast convergence rates that are known to be minimax optimal within the framework of non-parametric regression in reproducing kernel Hilbert spaces (RKHS). We provide bounds on the number of hidden neurons and the number of second-stage samples necessary for generalization. To justify our NTK regime, we additionally show that any operator approximable by a neural operator can also be approximated by an operator from the RKHS. A key application of neural operators is learning surrogate maps for the solution operators of partial differential equations (PDEs). We consider the standard Poisson equation to illustrate our theoretical findings with simulations.

Paper Structure

This paper contains 21 sections, 26 theorems, 189 equations, 6 figures.

Key Result

Proposition 2.3

Given the Assumptions ass:neurons, ass:input we have for any $u,u'\in \mathcal{U}$ with probability at least $1-\delta$, where $\delta \in( 0,1)$,

Figures (6)

  • Figure 1: Depiction of the architecture of our operator class.
  • Figure 2: A random realization of a polynomial $u$, its solution $v$ and the estimator $NO(u)$.
  • Figure 3: The logarithmic test-error for different choices of neurons $M$ and iterations $T$ and fixed $n_{\hbox{$\mathcal{X}$}} = 50$.
  • Figure 4: The logarithmic test-error for different choices of $n_{\hbox{$\mathcal{X}$}}$ and iterations $T$ and fixed $M=50$.
  • Figure 5: The test-error for different choices of $M$ and fixed $T=50$ and $n_x=50$.
  • ...and 1 more figures

Theorems & Definitions (51)

  • Definition 2.2
  • Proposition 2.3
  • Theorem 3.2: Approximation Property
  • Theorem 3.5
  • Corollary 3.6
  • Theorem 3.7: Bound for the Weights
  • Corollary 3.8: Refined Bounds
  • Remark 1: Analysis of NTK Spectrum.
  • Remark 2
  • Proposition B.2
  • ...and 41 more