Table of Contents
Fetching ...

A Deep Learning Framework for Multi-Operator Learning: Architectures and Approximation Theory

Adrien Weihs, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer

TL;DR

The work addresses the challenge of learning operators between function spaces by introducing two architectures, MONet and MNO, to handle families of operators (multi-operator learning) and by providing a rigorous universal approximation framework for continuous, measurable, and Lipschitz operator classes. It integrates scaling laws that quantify how network capacity must grow to achieve a target accuracy, specifically deriving explicit rates for Lipschitz operators and outlining how approximation order and architectural balance affect efficiency. The theory is complemented by extensive numerical experiments on parametric PDE benchmarks, where MONet and MNO demonstrate strong expressive power and computational efficiency relative to existing neural-operator baselines. Overall, the paper establishes a unified, scalable foundation for neural operator learning across operator collections, bridging expressive guarantees with practical design guidance for complex scientific applications.

Abstract

While many problems in machine learning focus on learning mappings between finite-dimensional spaces, scientific applications require approximating mappings between function spaces, i.e., operators. We study the problem of learning collections of operators and provide both theoretical and empirical advances. We distinguish between two regimes: (i) multiple operator learning, where a single network represents a continuum of operators parameterized by a parametric function, and (ii) learning several distinct single operators, where each operator is learned independently. For the multiple operator case, we introduce two new architectures, $\mathrm{MNO}$ and $\mathrm{MONet}$, and establish universal approximation results in three settings: continuous, integrable, or Lipschitz operators. For the latter, we further derive explicit scaling laws that quantify how the network size must grow to achieve a target approximation accuracy. For learning several single operators, we develop a framework for balancing architectural complexity across subnetworks and show how approximation order determines computational efficiency. Empirical experiments on parametric PDE benchmarks confirm the strong expressive power and efficiency of the proposed architectures. Overall, this work establishes a unified theoretical and practical foundation for scalable neural operator learning across multiple operators.

A Deep Learning Framework for Multi-Operator Learning: Architectures and Approximation Theory

TL;DR

The work addresses the challenge of learning operators between function spaces by introducing two architectures, MONet and MNO, to handle families of operators (multi-operator learning) and by providing a rigorous universal approximation framework for continuous, measurable, and Lipschitz operator classes. It integrates scaling laws that quantify how network capacity must grow to achieve a target accuracy, specifically deriving explicit rates for Lipschitz operators and outlining how approximation order and architectural balance affect efficiency. The theory is complemented by extensive numerical experiments on parametric PDE benchmarks, where MONet and MNO demonstrate strong expressive power and computational efficiency relative to existing neural-operator baselines. Overall, the paper establishes a unified, scalable foundation for neural operator learning across operator collections, bridging expressive guarantees with practical design guidance for complex scientific applications.

Abstract

While many problems in machine learning focus on learning mappings between finite-dimensional spaces, scientific applications require approximating mappings between function spaces, i.e., operators. We study the problem of learning collections of operators and provide both theoretical and empirical advances. We distinguish between two regimes: (i) multiple operator learning, where a single network represents a continuum of operators parameterized by a parametric function, and (ii) learning several distinct single operators, where each operator is learned independently. For the multiple operator case, we introduce two new architectures, and , and establish universal approximation results in three settings: continuous, integrable, or Lipschitz operators. For the latter, we further derive explicit scaling laws that quantify how the network size must grow to achieve a target approximation accuracy. For learning several single operators, we develop a framework for balancing architectural complexity across subnetworks and show how approximation order determines computational efficiency. Empirical experiments on parametric PDE benchmarks confirm the strong expressive power and efficiency of the proposed architectures. Overall, this work establishes a unified theoretical and practical foundation for scalable neural operator learning across multiple operators.

Paper Structure

This paper contains 39 sections, 11 theorems, 165 equations, 7 figures, 15 tables.

Key Result

Theorem 2.3

Suppose that Assumptions assumption:Main:assumptions:A1, assumption:Main:assumptions:S2 and assumption:Main:assumptions:S3 hold. Let $G$ be a nonlinear continuous operator mapping $U \mapsto V$, then, for any $\varepsilon > 0$, there exists a neural network defined in Eq. eq:chenchen, such that

Figures (7)

  • Figure 1: MONet architecture: The $\alpha$ function is the input for the parameter-approximation network. The $u$ function is the input for the function-approximation network. The spatial values $x\in\Omega_V$ are the input for the space-approximation network.
  • Figure 2: MNO architecture: The $\alpha$ function is the input for the parameter-approximation network. The $u$ function is the input for the function-approximation network. The spatial values $x\in\Omega_V$ are the input for the space-approximation network.
  • Figure 3: Representative solution for conservation laws: The target solution (left) and error maps for DeepONet, DeepONet-C, MIONet, $\mathrm{MONet}$, $\mathrm{MNO}$-S and $\mathrm{MNO}$-L. The instance-specific relative errors are 5.49%, 4.82%, 2.92%, 5.11%, 2.52% and 1.81%, respectively, aligning with the trends observed in Table \ref{['tab:conresult']}.
  • Figure 4: Representative solution for diffusion-reaction-advection equation: The target solution (left) and error maps for DeepONet, DeepONet-C, MIONet, $\mathrm{MONet}$, $\mathrm{MNO}$, and $\mathrm{MNO}$-L. The instance-specific relative errors are 15.02%, 5.26% , 6.27% , 5.26%, 3.08% and 2.38%, respectively, aligning with the trends observed in Table \ref{['tab:draresult']}.
  • Figure 5: Representative solution for the nonlinear Klein-Gordon equation: The target solution (left) and error maps for DeepONet, DeepONet-C, MIONet, $\mathrm{MONet}$, $\mathrm{MNO}$-S, and $\mathrm{MNO}$-L. The instance-specific relative errors are 19.84%, 7.99%, 14.26%, 6.06%, 3.76% and 2.34%, respectively, aligning with the trends observed in Table \ref{['tab:nkgresult']}.
  • ...and 2 more figures

Theorems & Definitions (33)

  • Definition 2.1: Tauber–Wiener functions
  • Definition 2.2: $\mathrm{Net}$ Network
  • Theorem 2.3: Universal Approximation Theorem for Single Operator ChenChen1995
  • Theorem 2.4: Universal Approximation for Functions ChenChen1995
  • Lemma 2.5: Finite-dimensional Approximations of Function Spaces ChenChen1995
  • Definition 2.6: Feedforward ReLU Network Class
  • Theorem 2.7: Function Approximation liu2024neuralscalinglawsdeep
  • Definition 3.1: $\mathrm{MONet}$ Network
  • Remark 3.2: $\mathrm{MONet_{vect}}$ Network
  • Definition 3.3: $\mathrm{MNO}$ Network
  • ...and 23 more