Operator Learning with Gaussian Processes

Carlos Mora; Amin Yousefpour; Shirin Hosseinmardi; Houman Owhadi; Ramin Bostanabad

Operator Learning with Gaussian Processes

Carlos Mora, Amin Yousefpour, Shirin Hosseinmardi, Houman Owhadi, Ramin Bostanabad

TL;DR

A hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods and enables zero-shot data-driven models for accurate predictions without prior training is introduced.

Abstract

Operator learning focuses on approximating mappings $\mathcal{G}^\dagger:\mathcal{U} \rightarrow\mathcal{V}$ between infinite-dimensional spaces of functions, such as $u: Ω_u\rightarrow\mathbb{R}$ and $v: Ω_v\rightarrow\mathbb{R}$. This makes it particularly suitable for solving parametric nonlinear partial differential equations (PDEs). While most machine learning methods for operator learning rely on variants of deep neural networks (NNs), recent studies have shown that Gaussian Processes (GPs) are also competitive while offering interpretability and theoretical guarantees. In this paper, we introduce a hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods. Instead of approximating the function-valued operator $\mathcal{G}^\dagger$, we use a GP to approximate its associated real-valued bilinear form $\widetilde{\mathcal{G}}^\dagger: \mathcal{U}\times\mathcal{V}^*\rightarrow\mathbb{R}.$ This bilinear form is defined by $\widetilde{\mathcal{G}}^\dagger(u,\varphi) := [\varphi,\mathcal{G}^\dagger(u)],$ which allows us to recover the operator $\mathcal{G}^\dagger$ through $\mathcal{G}^\dagger(u)(y)=\widetilde{\mathcal{G}}^\dagger(u,δ_y).$ The GP mean function can be zero or parameterized by a neural operator and for each setting we develop a robust training mechanism based on maximum likelihood estimation (MLE) that can optionally leverage the physics involved. Numerical benchmarks show that (1) it improves the performance of a base neural operator by using it as the mean function of a GP, and (2) it enables zero-shot data-driven models for accurate predictions without prior training. Our framework also handles multi-output operators where $\mathcal{G}^\dagger:\mathcal{U} \rightarrow\prod_{s=1}^S\mathcal{V}^s$, and benefits from computational speed-ups via product kernel structures and Kronecker product matrix representations.

Operator Learning with Gaussian Processes

TL;DR

Abstract

Operator learning focuses on approximating mappings

between infinite-dimensional spaces of functions, such as

and

. This makes it particularly suitable for solving parametric nonlinear partial differential equations (PDEs). While most machine learning methods for operator learning rely on variants of deep neural networks (NNs), recent studies have shown that Gaussian Processes (GPs) are also competitive while offering interpretability and theoretical guarantees. In this paper, we introduce a hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods. Instead of approximating the function-valued operator

, we use a GP to approximate its associated real-valued bilinear form

This bilinear form is defined by

which allows us to recover the operator

through

The GP mean function can be zero or parameterized by a neural operator and for each setting we develop a robust training mechanism based on maximum likelihood estimation (MLE) that can optionally leverage the physics involved. Numerical benchmarks show that (1) it improves the performance of a base neural operator by using it as the mean function of a GP, and (2) it enables zero-shot data-driven models for accurate predictions without prior training. Our framework also handles multi-output operators where

, and benefits from computational speed-ups via product kernel structures and Kronecker product matrix representations.

Paper Structure (24 sections, 48 equations, 10 figures, 3 tables)

This paper contains 24 sections, 48 equations, 10 figures, 3 tables.

Introduction
Description of the Operator Learning Problem
Summary of the Proposed Approach
Illustrative Example
Review of Related Literature
Contributions and Article Outline
Proposed Framework for Operator Learning
Data-driven Operator Learning
Single-output Operators
Multi-output Operators
Inference
Physics-informed Operator Learning
Parameter Initialization and Stability
Effect of Observation Operator
High-Dimensional Features in GPs
...and 9 more sections

Figures (10)

Figure 1: Diagram of our framework for operator learning: We convert the operator learning problem to a regression one which can be solved via GPs. For multi-output operator learning, we use multi-response or multi-task GPs.
Figure 2: Data-driven and physics-informed operator learning for the Burgers' problem with Dirichlet BCs: The training dataset has $400$ pairs of $\{ u_i, v_i \}$ that satisfy \ref{['eq burgers dbc']}. The observation operators $\phi$ and $\psi$ sample $u_i$ and $v_i$ at $p=100$ and $q=12^2$ collocation points, respectively. It can be observed that leveraging the physics reduces the prediction error. A DeepONet is used as the mean function of the GP in both data-driven and physics-informed cases.
Figure 3: Effect of $\beta_y$ on test accuracy with dense observations: Very small and large $\beta$ values provide poor performance as they cause numerical issues and cannot capture spatial correlations, respectively. However, a relatively large range of values such as $10^3$ provide high accuracy and numerical stability.
Figure 4: Interaction between $\beta$ and feature space dimensionality: Smaller $\beta$ values are needed to achieve a desired correlation value in high-dimensional feature spaces.
Figure 5: Loss and error profiles as functions of kernel parameters in zero-mean GPs: A wide range of parameter combinations provide optimal loss and error values, indicating that proper initialization is relatively easy and results in good performance even without training. $\beta_y = 10^{3}$ is used for the two loss profiles on the left-hand side of (a) and (b). Note that we do not show the dependency of the test relative $L2$ error with respect to $\sigma^2_\phi$ since this does not affect \ref{['eq expected value posterior one sample']}. For Darcy, $\beta_y$ has two components and so for plotting the error map we presume that these components are equal (a similar assumption is made for $\beta_\phi$).
...and 5 more figures

Operator Learning with Gaussian Processes

TL;DR

Abstract

Operator Learning with Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (10)