Table of Contents
Fetching ...

Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks

Adrien Weihs, Hayden Schaeffer

Abstract

Multiple operator learning concerns learning operator families $\{G[α]:U\to V\}_{α\in W}$ indexed by an operator descriptor $α$. Training data are collected hierarchically by sampling operator instances $α$, then input functions $u$ per instance, and finally evaluation points $x$ per input, yielding noisy observations of $G[α][u](x)$. While recent work has developed expressive multi-task and multiple operator learning architectures and approximation-theoretic scaling laws, quantitative statistical generalization guarantees remain limited. We provide a covering-number-based generalization analysis for separable models, focusing on the Multiple Neural Operator (MNO) architecture: we first derive explicit metric-entropy bounds for hypothesis classes given by linear combinations of products of deep ReLU subnetworks, and then combine these complexity bounds with approximation guarantees for MNO to obtain an explicit approximation-estimation tradeoff for the expected test error on new (unseen) triples $(α,u,x)$. The resulting bound makes the dependence on the hierarchical sampling budgets $(n_α,n_u,n_x)$ transparent and yields an explicit learning-rate statement in the operator-sampling budget $n_α$, providing a sample-complexity characterization for generalization across operator instances. The structure and architecture can also be viewed as a general purpose solver or an example of a "small'' PDE foundation model, where the triples are one form of multi-modality.

Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks

Abstract

Multiple operator learning concerns learning operator families indexed by an operator descriptor . Training data are collected hierarchically by sampling operator instances , then input functions per instance, and finally evaluation points per input, yielding noisy observations of . While recent work has developed expressive multi-task and multiple operator learning architectures and approximation-theoretic scaling laws, quantitative statistical generalization guarantees remain limited. We provide a covering-number-based generalization analysis for separable models, focusing on the Multiple Neural Operator (MNO) architecture: we first derive explicit metric-entropy bounds for hypothesis classes given by linear combinations of products of deep ReLU subnetworks, and then combine these complexity bounds with approximation guarantees for MNO to obtain an explicit approximation-estimation tradeoff for the expected test error on new (unseen) triples . The resulting bound makes the dependence on the hierarchical sampling budgets transparent and yields an explicit learning-rate statement in the operator-sampling budget , providing a sample-complexity characterization for generalization across operator instances. The structure and architecture can also be viewed as a general purpose solver or an example of a "small'' PDE foundation model, where the triples are one form of multi-modality.

Paper Structure

This paper contains 36 sections, 11 theorems, 184 equations, 1 figure, 1 table.

Key Result

Theorem 1.1

Let $G:W \mapsto \{G[\alpha]:U \mapsto W\}_{\alpha \in W}$ be a Lipschitz multiple operator map from the function space $W$ into Lipschitz operators from $U$ to $V$. Assume that we observe sampled noisy data: where $\zeta_{\ell i j}$ denotes observation noise. For every $\varepsilon > 0$, there exists a MNO trained on $\{y_{\ell i j}\}$, whose expected test error on unseen triples $(\alpha,u,x)$

Figures (1)

  • Figure 1: Schematic structure of the training dataset $S_{G, \{y_s\}, \{c_s\}}$ defined in Definition \ref{['def:trainingSet']} for multiple operator learning. supG($\sigma^2$) denotes a sub-Gaussian distribution with variance proxy $\sigma^2$. Blue arrows indicate discretization steps; black arrows represent the flow of data.

Theorems & Definitions (35)

  • Theorem 1.1: Generalization error for MNO
  • Definition 2.1: Feedforward ReLU network class
  • Definition 2.2: $\mathrm{MNO}$ Architecture
  • Example 2.3: Homogeneous kernels with parameter-dependent interaction radius
  • Example 2.4: Variable-order fractional kernel operators
  • Example 2.5: Green-kernel representation of a parameterized PDE solution operator
  • Example 2.6: Nonlinear PDE solution operator with a shared semigroup structure
  • Example 2.7: PROSE architecture
  • Theorem 2.8: Multiple Operator Scaling Laws
  • Corollary 2.9: Clipped network scaling laws
  • ...and 25 more