Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

J. Antonio Lara Benitez; Takashi Furuya; Florian Faucher; Anastasis Kratsios; Xavier Tricoche; Maarten V. de Hoop

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

J. Antonio Lara Benitez, Takashi Furuya, Florian Faucher, Anastasis Kratsios, Xavier Tricoche, Maarten V. de Hoop

TL;DR

This work tackles the challenge of out-of-distribution generalization for neural operators applied to high-frequency Helmholtz-type PDEs. It introduces a transformer-inspired Sequential Neural Operator with minor identity skip connections and optional stochastic depth (sNO+I), showing improved in-distribution and out-of-distribution performance, especially for wave-speed-to-solution mappings. The authors derive risk bounds for neural-operator families under Gaussian measures on Banach spaces, relate these bounds to Rademacher complexities, and demonstrate that stochastic depth yields tighter OOD bounds, with a special focus on the Cameron–Martin space entropy. They further propose a hypernetwork-based surrogate for the forward operator to efficiently simulate multiple forward solves, enabling practical use in inverse problems and Bayesian workflows. The combination of empirical demonstrations and theory provides a principled path toward robust neural-operator models for high-frequency wave propagation problems.

Abstract

Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator mapping wave speed to solution, or boundary values for the Helmholtz equation on a bounded domain. The latter operator is commonly referred to as the ''forward'' operator in the study of inverse problems. Our methodology draws inspiration from transformers and techniques such as stochastic depth. Our experiments reveal certain surprises in the generalization and the relevance of introducing stochastic depth. Our NOs show superior performance as compared with standard NOs, not only for testing within the training distribution but also for out-of-distribution scenarios. To delve into this observation, we offer an in-depth analysis of the Rademacher complexity associated with our modified models and prove an upper bound tied to their stochastic depth that existing NOs do not satisfy. Furthermore, we obtain a novel out-of-distribution risk bound tailored to Gaussian measures on Banach spaces, again relating stochastic depth with the bound. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator.

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

TL;DR

Abstract

Paper Structure (71 sections, 16 theorems, 145 equations, 33 figures, 17 tables)

This paper contains 71 sections, 16 theorems, 145 equations, 33 figures, 17 tables.

Introduction
Proposed networks: "Metaforming the neural operator"
Neural Operator: standard structure
Sequential neural operators (${\textit{s}}{\text{NO}}$)
${\textit{s}}{\text{NO}}+\varepsilon \mathrm{I}$: ${\textit{s}}{\text{NO}}$ with the identity map--skip connection--
${\textit{s}}{\text{NO}}+\varepsilon \mathrm{I}$ without stochastic depth
${\textit{s}}{\text{NO}}+\varepsilon \mathrm{I}$ with stochastic depth
Parametric time-harmonic wave equations, forward operator and data generation
Time-harmonic wave equations
Wave speed to solution map, $\boldsymbol{\mathcal{G}} : c \mapsto {\mathrm{p}}$
Forward operator $\boldsymbol{\mathcal{F}}^{f}_\omega : (c,f,\omega) \mapsto \{ {\mathrm{p}}(\boldsymbol{x}_j,\omega,f) \}_{j=1,\ldots,n_{\mathrm{rcv}}}$
Wave speeds as Gaussian random fields (GRF). Whittle–Matérn field
Training and testing in-distribution for $\boldsymbol{\mathcal{G}}$
Neural network "prediction" of the wavefield
Hyperparameters of the neural networks
...and 56 more sections

Key Result

Theorem 7.3

Suppose that either of Assumption Assumption2 or Assumption Assumption3, that the small ball function $\psi$ satisfies Assumption ass:Small_Ball_Regularity, and that there is an $\varepsilon\ge 0$ such that the coupling condition eq:coupling_condition holds. Then there exists a constant $C_{\mu}$, d holds with probability at-least $1-\delta$; where $\bar{L}:=L_{\ell}\max\{1,L^{\star}\}\max\{1,L_{\

Figures (33)

Figure 1: ${\textit{s}}{\text{NO}}$ is called sequential, as the integral operator is followed by a MLP in a sequential manner. For comparison with the NO, see \ref{['fig:NO']}.
Figure 2: ${\textit{s}}{\text{NO}}+\varepsilon \mathrm{I}$ without stochastic depth. It is a modification based on the $\text{sequential}$ structure in where we incorporate layer normalization and skip connection as in transformers. For comparison with the NO, see \ref{['fig:NO']}.
Figure 3: Illustration of domain $D$: Dirichlet boundary condition is imposed on the the top (red line, $\Gamma_1$), while absorbing absorbing boundary conditions are imposed elsewhere (blue line, $\Gamma_2$). The source ($\star$) is typically positioned near surface.
Figure 4: Illustration of the full-wave dataset for experiment that considers a computational domain of size 1.27$\times$1.27 with a source near surface. The wave speed and pressure field are represented on a Cartesian grid of size 64.0$\times$64.0 with a grid step of 20.0m. The complete dataset corresponds to 50000.0 couples made up of a wave speed model and associated acoustic wave.
Figure 5: Illustration of forward operator experiment that considers a computational domain of size 1.27$\times$1.27 with $64$ source near surface, and $128$ receivers located slightly beneath the sources' location. The illustration of the wave field represent the "matrix" response, each row corresponds to a source, and each column to the pressure field registered by the receivers' line.
...and 28 more figures

Theorems & Definitions (52)

Remark 3.1
Remark 4.1
Remark 4.2
Remark 5.1
Remark 6.1
Definition 7.1: Gaussian measure
Theorem 7.3: Out-of-distributional generalization bounds for the NO and ${\textit{s}}{\text{NO}}+\varepsilon \mathrm{I}$v2 hypothesis classes
Lemma 7.4: Estimates on small ball functions for Gaussian sheets in uniform topology mason2001small
Example 7.1: Estimate on the standard Brownian sheet on $[0,1]^2$ kuelbs1993metric
Example 7.2: Change of distribution
...and 42 more

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

TL;DR

Abstract

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (33)

Theorems & Definitions (52)