Table of Contents
Fetching ...

Thompson Sampling in Function Spaces via Neural Operators

Rafael Oliveira, Xuesong Wang, Kian Ming A. Chai, Edwin V. Bonilla

TL;DR

This work extends Thompson sampling to optimization over function spaces by treating neural operators as surrogates for unknown solution operators ${G}_*:{\mathcal A}\to{\mathcal U}$ and optimizing known functionals $f:{\mathcal U}\to\mathbb{R}$ of their outputs. It provides a sample-then-optimize framework (NOTS) that avoids explicit posterior uncertainty quantification by leveraging the infinite-width GP correspondence of neural operators via the conjugate kernel, yielding sublinear Bayesian regret in the finite-domain setting. The authors establish a theoretical bridge between neural operators and Gaussian processes for operator-valued kernels, derive regret guarantees for NOTS, and validate the approach on PDE benchmarks (Darcy flow and shallow-water) where functionals of the operator output are optimized. The results show significant sample-efficiency improvements over GP-based and neural TS baselines, highlighting NOTS's scalability to high-dimensional, function-valued inputs and outputs with practical PDE applications.

Abstract

We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that queries to the operator (such as running a high-fidelity simulator or physical experiment) are costly, while functional evaluations on the operator's output are inexpensive. Our algorithm employs a sample-then-optimize approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process (GP) posterior. We derive regret bounds and theoretical results connecting neural operators with GPs in infinite-dimensional settings. Experiments benchmark our method against other Bayesian optimization baselines on functional optimization tasks involving partial differential equations of physical systems, demonstrating better sample efficiency and significant performance gains.

Thompson Sampling in Function Spaces via Neural Operators

TL;DR

This work extends Thompson sampling to optimization over function spaces by treating neural operators as surrogates for unknown solution operators and optimizing known functionals of their outputs. It provides a sample-then-optimize framework (NOTS) that avoids explicit posterior uncertainty quantification by leveraging the infinite-width GP correspondence of neural operators via the conjugate kernel, yielding sublinear Bayesian regret in the finite-domain setting. The authors establish a theoretical bridge between neural operators and Gaussian processes for operator-valued kernels, derive regret guarantees for NOTS, and validate the approach on PDE benchmarks (Darcy flow and shallow-water) where functionals of the operator output are optimized. The results show significant sample-efficiency improvements over GP-based and neural TS baselines, highlighting NOTS's scalability to high-dimensional, function-valued inputs and outputs with practical PDE applications.

Abstract

We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that queries to the operator (such as running a high-fidelity simulator or physical experiment) are costly, while functional evaluations on the operator's output are inexpensive. Our algorithm employs a sample-then-optimize approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process (GP) posterior. We derive regret bounds and theoretical results connecting neural operators with GPs in infinite-dimensional settings. Experiments benchmark our method against other Bayesian optimization baselines on functional optimization tasks involving partial differential equations of physical systems, demonstrating better sample efficiency and significant performance gains.

Paper Structure

This paper contains 65 sections, 7 theorems, 73 equations, 6 figures, 1 table, 2 algorithms.

Key Result

proposition 1

Let ${{G}}_{\boldsymbol{\mathbf{\theta}}}: {\mathcal{A}} \to {\mathcal{U}}$ be a neural operator with a single hidden layer, where ${\mathcal{U}} \subseteq {\mathcal{L}}^{2}(\nu)$ is closed, and $\nu$ is a finite Borel measure on ${\mathcal{Z}}$. Assume ${\boldsymbol{\mathbf{w}}}_o \sim \mathcal{N}( where ${{K}}_{{G}}: {\mathcal{A}} \times {\mathcal{A}} \to {\mathcal{L}}({\mathcal{U}})$ is defined

Figures (6)

  • Figure 1: Darcy flow rate optimization. Overlay of cumulative regret (top left) and its average (top right) metrics across trials for the negative total flow rates case in the Darcy flow problem. The shaded areas correspond to one standard deviation across 10 trials. The corresponding input-output functions that achieved the best and worst flow rates are presented (bottom). White regions $a(x)=1$ means fully open permeability and black regions $a(x)=0$ represents impermeable pore material. The output function suggests pressure field where brighter color indicates higher pressure.
  • Figure 2: Darcy flow pressure \ref{['fig:regret-darcy-flow-pressure']} and potential energy \ref{['fig:regret-darcy-flow-energy']} optimization problems averaged cumulative regret. The shaded areas correspond to one standard deviation across 10 trials.
  • Figure 3: Shallow water inverse problem. Overlay of cumulative regret (left) and its average (right) metrics across trials for the inverse problem in the shallow water data. The shaded areas correspond to one standard deviation across 10 trials.
  • Figure 4: Cumulative regret across trials for the Darcy flow rate optimization problem with only the last linear layer of a single-hidden-layer FNO trained via full-batch gradient descent for NOTS (labeled as SNOTS). All our results were averaged over 10 independent trials, and shaded areas represent $\pm 1$ standard deviation.
  • Figure 5: Cumulative regret across trials for the Darcy flow total pressure optimization problem with only the last linear layer of a single-hidden-layer FNO trained via full-batch gradient descent for NOTS (labeled as SNOTS).
  • ...and 1 more figures

Theorems & Definitions (13)

  • proposition 1
  • proposition 2
  • definition 1: Multi-Layer Fully-Connected Neural Network
  • lemma 1: Infinite-width limit Hanin2023nngp
  • lemma 2: Thm. 3.1 in Takeno2024
  • remark 1
  • lemma 3: Continuity of limiting GP
  • proof
  • proposition 2
  • proof : Proof of Proposition \ref{['thr:kernel']}
  • ...and 3 more