Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

Blaine Quackenbush; Paul J. Atzberger

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

Blaine Quackenbush, Paul J. Atzberger

TL;DR

A rigorous framework for extending neural operators to handle out-of-distribution input functions by leveraging kernel approximation techniques and providing theory for characterizing the input-output function spaces in terms of Reproducing Kernel Hilbert Spaces (RKHSs).

Abstract

We develop a rigorous framework for extending neural operators to handle out-of-distribution input functions. We leverage kernel approximation techniques and provide theory for characterizing the input-output function spaces in terms of Reproducing Kernel Hilbert Spaces (RKHSs). We provide theorems on the requirements for reliable extensions and their predicted approximation accuracy. We also establish formal relationships between specific kernel choices and their corresponding Sobolev Native Spaces. This connection further allows the extended neural operators to reliably capture not only function values but also their derivatives. Our methods are empirically validated through the solution of elliptic partial differential equations (PDEs) involving operators on manifolds having point-cloud representations and handling geometric contributions. We report results on key factors impacting the accuracy and computational performance of the extension approaches.

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

TL;DR

Abstract

Paper Structure (14 sections, 4 theorems, 46 equations, 5 figures, 6 tables)

This paper contains 14 sections, 4 theorems, 46 equations, 5 figures, 6 tables.

Neural Operators
Approximation of Solution Operators for Partial Differential Equations.
Theory for Extending Neural Operators
Native Spaces for Kernel Approximations.
Function Approximation using Kernel Methods
Function Approximation using Regularized Kernel Methods
Proofs for the Kernel Extension Theorems \ref{['thm:op_extend']} and \ref{['thm:op_extend_manifold']}.
Proof of Theorem \ref{['thm:op_extend']}.
Proof of Theorem \ref{['thm:op_extend_manifold']}.
Training Methods for the Neural Operators and Sobolev Loss
Approximating Kernel Integral Operators using Separable Factors.
Sobolev Training of the Neural Operators
Results: Accuracy of the Neural Operators and Extension Methods
Additional Results: Role of Kernel Choices and Sources of Error

Key Result

Theorem 1.1

Let $\Omega \subset \mathbb{R}^d$ be a compact set and $k(\cdot,\cdot)$ a symmetric positive-definite kernel with a native space that is norm-equivalent to the Sobolev Space $\mathcal{H}^s(\Omega)$ for $\lceil s \rceil > d/2 + 2$. Suppose the operator $\mathcal{S}$ is bounded with $\left\lVert S(f) and the kernel approximation function $\tilde{f}$ satisfies Then, the approximated solution using

Figures (5)

Figure 1: Operator Extension Methods. We develop methods for extending conventional neural operators and geometric neural operators to robustly handle inputs beyond the functions in the training data. We leverage properties of kernel approximations. Neural operators process an input function $w(\cdot)$ and geometric neural operators also process information from the geometry $\Phi(\cdot)$. The operators use a combination of a featurizing lifting operator $\mathcal{P}$, kernel operator layers, and a projection operator $\mathcal{Q}$ to obtain the output function $u(\cdot)$, (top). We obtain representations for input functions in terms of kernels $k_\sigma$, (lower-left). The kernel constructions are shown for a subset of the points in the sum. The operator layer involves a combination of a kernel integration $\mathcal{K}[v]$ and a linear operation ${W}[v]$ that is passed through a non-linear activation function $\sigma(\cdot)$, (lower-center). The kernel representations are then used to construct the output function $\tilde{u}(\cdot)$, (lower-right).
Figure 2: Kernels Restricted to Manifolds $k_{\mathcal{M}}$. We develop theory for kernel approximation based on restricting kernels from the bulk ambient space $k(x,y)$ to a manifold surface $k_{\mathcal{M}}(x,y)$. Even when the bulk kernel $k$ has nice properties and symmetries, such as being a radial basis function, the restricted kernels $k_{\mathcal{M}}$ may not inherit these attributes. The $k_{\mathcal{M}}$ may no longer be radial symmetric $k_{\mathcal{M}}(x,y) \neq k_{\mathcal{M}}(|x - y|)$ or translation invariant $k_{\mathcal{M}}(x,y) \neq k_{\mathcal{M}}(x,y+\Delta)$.
Figure 3: Kernels. We show a few different kernels used in our approximations and comparison studies. For the Gaussian kernels, we use $\Phi(\sigma \cdot r)$ with values for $\sigma = \ell_0^{-1}$ with $\ell_0 = 0.2, 0.3,0.4$ for the cases labeled $A$, $B$, $C$. For the Matérn kernels, the $A$, $B$, $C$ cases correspond to the Basic, Linear, and Quadratic kernels with the parameters shown in Table \ref{['table:sobolev_kernels']}. For the Wendland kernels, we consider the cases with $k=0$, $k=1$, $k=2$ with parameters in Table \ref{['table:sobolev_kernels']}.
Figure 4: Kernel Integration: Methods for Improving Performance. In neural operators a major computational cost during both training and evaluation is to compute the kernel integral operations $\mathcal{K}[v]$. An edge-conditioned convolution is widely used for this approximation, but scales as $O(N^2)$ becoming prohibitively expensive as the density of points increases within the support of the kernel, (top). We develop more computationally efficient alternatives using node-conditioned convolutions based on factoring kernels into a separable form $k(x,y) = k_1(x)k_2(y)$. The node-based methods scale as $O(N)$ allowing for approximations using efficient gathering and scattering operations greatly reducing computational costs, (bottom).
Figure 5: Manifold Shapes and Kernel Restrictions. We show the manifolds used in our comparison studies. We also show how the Wendland kernel with parameter $k=0$ behaves when restricted to the manifold surface to obtain $k_{\mathcal{M}}$. During training the different kernels are centered at a sampling of locations $\{x_i\}$ on the manifold providing a collection of functions $k_{\sigma}(\cdot,x_i)$ for use during training.

Theorems & Definitions (6)

Theorem 1.1
Theorem 1.2
Theorem 2.1
Theorem 2.2
proof
proof

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

TL;DR

Abstract

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)