Table of Contents
Fetching ...

Ensemble and Mixture-of-Experts DeepONets For Operator Learning

Ramansh Sharma, Varun Shankar

TL;DR

The paper tackles operator learning for PDEs by enriching DeepONet trunks through an ensemble approach and a spatially local PoU-MoE trunk. It formalizes an ensemble DeepONet with multiple trunks and a single branch, and proves universal approximation properties for the ensemble, including a PoU-MoE variant that blends local trunks via partition-of-unity weights. Empirical results on 2D and 3D problems with sharp gradients show 2–4x relative improvements over standard DeepONets, with POD-PoU often delivering the best accuracy across challenging tasks, at the cost of increased training time. These findings highlight the value of combining global and local basis functions and motivate further work on adaptive partitioning and parallel implementations to manage computational overhead while preserving accuracy gains.

Abstract

We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative $\ell_2$ errors than standard DeepONets and POD-DeepONets on both standard and challenging new operator learning problems involving partial differential equations (PDEs) in two and three dimensions. Our new PoU-MoE formulation provides a natural way to incorporate spatial locality and model sparsity into any neural network architecture, while our new ensemble DeepONet provides a powerful and general framework for incorporating basis enrichment in scientific machine learning architectures for operator learning.

Ensemble and Mixture-of-Experts DeepONets For Operator Learning

TL;DR

The paper tackles operator learning for PDEs by enriching DeepONet trunks through an ensemble approach and a spatially local PoU-MoE trunk. It formalizes an ensemble DeepONet with multiple trunks and a single branch, and proves universal approximation properties for the ensemble, including a PoU-MoE variant that blends local trunks via partition-of-unity weights. Empirical results on 2D and 3D problems with sharp gradients show 2–4x relative improvements over standard DeepONets, with POD-PoU often delivering the best accuracy across challenging tasks, at the cost of increased training time. These findings highlight the value of combining global and local basis functions and motivate further work on adaptive partitioning and parallel implementations to manage computational overhead while preserving accuracy gains.

Abstract

We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative errors than standard DeepONets and POD-DeepONets on both standard and challenging new operator learning problems involving partial differential equations (PDEs) in two and three dimensions. Our new PoU-MoE formulation provides a natural way to incorporate spatial locality and model sparsity into any neural network architecture, while our new ensemble DeepONet provides a powerful and general framework for incorporating basis enrichment in scientific machine learning architectures for operator learning.
Paper Structure (33 sections, 2 theorems, 21 equations, 7 figures, 6 tables)

This paper contains 33 sections, 2 theorems, 21 equations, 7 figures, 6 tables.

Key Result

Theorem 1

Let $\mathcal{G}: \mathcal{U} \to \mathcal{V}$ be a continuous operator. Define $\hat{\mathcal{G}}$ as $\hat{\mathcal{G}}(u)(y) = \left\langle \hat{\boldsymbol{\tau}}(y; \theta_{\boldsymbol{\tau}_1}; \theta_{\boldsymbol{\tau}_2}; \theta_{\boldsymbol{\tau}_3}), \hat{\boldsymbol{\beta}}(u;\theta_b) \r where $\epsilon > 0$ can be made arbitrarily small and $\|\cdot \|_{\mathcal{V}}$ is the norm of th

Figures (7)

  • Figure 1: An ensemble DeepONet containing a POD trunk and a PoU-MoE trunk.
  • Figure 2: Enriched bases on the 2D reaction-diffusion problem \ref{['sec:diffrec']}. The solutions exhibit sharp gradients (left); the PoU-MoE trunk has learned spatially-localized basis functions (middle); the POD trunk has learned a global basis function (right).
  • Figure 3: The 2D lid-driven cavity flow problem. We show in (A) an example input function; in (B) an example output function component; in (C) the four patches used for the PoU-MoE trunk; in (D), (E), and (F) the spatial mean squared error (MSE) for the vanilla, ensemble vanilla-POD-PoU, and ensemble POD-PoU DeepONets respectively (see Error calculations in Section \ref{['sec:results']} for computing MSE for vector-valued functions).
  • Figure 4: The 2D reaction-diffusion problem. We show in (A) an example input function; in (B) an example output function; in (C) the six patches used for the PoU-MoE trunk; in (D), (E), and (F) the spatial mean squared error (MSE) for the vanilla, ensemble $(P+1)$-vanilla, and ensemble POD-PoU DeepONets respectively.
  • Figure 5: Vanilla-DeepONet and PoU (from POD-PoU ensemble DeepONet) basis functions for the largest branch modes on the 2D reaction-diffusion problem.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof