Operator Learning Using Random Features: A Tool for Scientific Computing

Nicholas H. Nelsen; Andrew M. Stuart

Operator Learning Using Random Features: A Tool for Scientific Computing

Nicholas H. Nelsen, Andrew M. Stuart

TL;DR

The paper addresses learning operators between infinite-dimensional function spaces to accelerate many-query PDE tasks. It introduces function-valued random features (RFM), a convex, data-driven surrogate that reduces training to a finite-dimensional quadratic problem and corresponds to kernel ridge regression in a low-rank RKHS induced by random features. The authors provide convergence guarantees and error bounds, and demonstrate mesh-invariant, transferable performance on Burgers' equation and Darcy flow operator learning. The approach offers a nonintrusive, scalable alternative to deep neural operators, with practical impact for scientific computing and uncertainty quantification. Overall, RFMs enable reliable, discretization-agnostic operator learning with theoretical support and concrete PDE applications.

Abstract

Supervised operator learning centers on the use of training data, in the form of input-output pairs, to estimate maps between infinite-dimensional spaces. It is emerging as a powerful tool to complement traditional scientific computing, which may often be framed in terms of operators mapping between spaces of functions. Building on the classical random features methodology for scalar regression, this paper introduces the function-valued random features method. This leads to a supervised operator learning architecture that is practical for nonlinear problems yet is structured enough to facilitate efficient training through the optimization of a convex, quadratic cost. Due to the quadratic structure, the trained model is equipped with convergence guarantees and error and complexity bounds, properties that are not readily available for most other operator learning architectures. At its core, the proposed approach builds a linear combination of random operators. This turns out to be a low-rank approximation of an operator-valued kernel ridge regression algorithm, and hence the method also has strong connections to Gaussian process regression. The paper designs function-valued random features that are tailored to the structure of two nonlinear operator learning benchmark problems arising from parametric partial differential equations. Numerical results demonstrate the scalability, discretization invariance, and transferability of the function-valued random features method.

Operator Learning Using Random Features: A Tool for Scientific Computing

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 53 equations, 9 figures, 1 table)

This paper contains 19 sections, 2 theorems, 53 equations, 9 figures, 1 table.

Introduction
Literature Review
Contributions
Methodology
Problem Formulation
Operator-Valued Reproducing Kernels
Random Feature Model
An Intractable Nonparametric Model Class
A Tractable Parametric Model Class
Connection to Neural Networks and Neural Operators
Optimization
Error Bounds
Application to PDE Solution Operators
Burgers' Equation: Formulation
Darcy Flow: Formulation
...and 4 more sections

Key Result

Theorem 2.11

\newlabelthm:converge0 Let ass:theory hold. Suppose that the integral operator $T_{k_\mu}\in\mathcal{L}(L^2_\nu(\mathcal{X};\mathcal{Y}))$ in eqn:integral_operator is injective. Let $\{\delta_l\}_{l\in\mathbb{N}}\subset (0,1)$ be any positive sequence with the property that $\sum_{l=1}^\infty \del then the trained RFM satisfies

Figures (9)

Figure 1: Brownian bridge RFM for one-dimensional input-output spaces with $n=32$ training points fixed and $\lambda=0$ (\ref{['ex:bb']}): As $m\to\infty$, the RFM approaches the nonparametric interpolant given by the representer theorem (Figure \ref{['fig:bb_compare']}(\ref{['fig:bb4']})), which in this case is a piecewise linear approximation of the true function (an element of RKHS $\mathcal{H}_{k_{\mu}}=H_{0}^1$, shown in red). Blue lines denote the trained model evaluated on test data points and black circles denote evaluation at training points.
Figure 1: Random feature map construction for Burgers' equation: Figure \ref{['fig:rf_and_filter']}(\ref{['fig:rf_sample_burg']}) displays a representative input-output pair for the random feature $\varphi({\,\cdot\,};\theta)$ with $\theta\sim\mu$\ref{['eqn:rf_fourier']}, while Figure \ref{['fig:rf_and_filter']}(\ref{['fig:filter_func1']}) shows the filter $k\mapsto \chi(k)$ for $\delta=0.0025$ and $\beta=4$\ref{['eqn:filter']}.
Figure 1: Representative input-output test sample for the Burgers' equation solution map $F^{\dagger}\coloneqq \Psi_{1}$: Figure \ref{['fig:sample_burg']}(\ref{['fig:prediction_onesample']}) shows a sample input, output (truth), and trained RFM prediction (test), while Figure \ref{['fig:sample_burg']}(\ref{['fig:pwerror_onesample']}) displays the pointwise error. The relative $L^2$ error for this single prediction is $0.0146$. Here, $n=512$, $m=1024$, and $K=1025$.
Figure 2: Random feature map construction for Darcy flow: Figure \ref{['fig:rf_and_coef_darcy']}(\ref{['fig:coef_darcy']}) displays a representative input draw $a$ with $\tau=3,\, \alpha=2$ and $a^{+}=12,\, a^{-}=3$; Figure \ref{['fig:rf_and_coef_darcy']}(\ref{['fig:rf_darcy']}) shows the output random feature $\varphi(a;\theta)$ (equation \ref{['eqn:rf_predictor_corrector']}) taking the coefficient $a$ as input. Here, $f\equiv 1$, $\tau'=7.5,\, \alpha'=2$, $s^{+}=1/a^{+}$, $s^{-}=-1/a^{-}$, and $\delta = 0.15$.
Figure 2: Expected relative test error of a trained RFM for the Burgers' evolution operator $F^{\dagger}=\Psi_{1}$ with $n'=4000$ test pairs: Figure \ref{['fig:gridtranfser_burg']}(\ref{['fig:gridtransfer_burg_panel']}) displays the invariance of test error w.r.t. training and testing on different resolutions for $m=1024$ and $n=512$ fixed; the RFM can train and test on different mesh sizes without loss of accuracy. Figure \ref{['fig:gridtranfser_burg']}(\ref{['fig:gridsweep_burg_n']}) shows the decay of the test error for resolution $K=129$ fixed as a function of $m$ and $n$; the error follows the $O(m^{-1/2})$ Monte Carlo rate remarkably well and the smallest error achieved is $0.0303$ for $n=1000$ and $m=1024$.
...and 4 more figures

Theorems & Definitions (5)

Definition 2.2: operator-valued kernel
Definition 2.7: RFM
Example 2.9: Brownian bridge
Theorem 2.11: almost sure convergence of trained RFM
Theorem 2.12: complexity bounds for trained RFM

Operator Learning Using Random Features: A Tool for Scientific Computing

TL;DR

Abstract

Operator Learning Using Random Features: A Tool for Scientific Computing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (5)