Size Lowerbounds for Deep Operator Networks

Anirbit Mukherjee; Amartya Roy

Size Lowerbounds for Deep Operator Networks

Anirbit Mukherjee, Amartya Roy

TL;DR

The paper establishes a data-dependent lower bound on the DeepONet size needed to drive empirical training error below a noise-driven threshold in operator learning for PDEs. It proves that the common output dimension $q$ must satisfy $q \ge \Omega\left(n^{1/4}\right)$ under general conditions, linking architecture size to available training data. Specializing to sigmoid-ended networks and conducting ADR PDE experiments, it shows that increasing $q$ to reduce training error at fixed model size requires roughly quadratic growth in the data to maintain improvement, revealing a scaling law for DeepONets. These results inform practical design choices for neural operators and highlight directions for extending theory to PDE-specific structures and more refined bounds.

Abstract

Deep Operator Networks are an increasingly popular paradigm for solving regression in infinite dimensions and hence solve families of PDEs in one shot. In this work, we aim to establish a first-of-its-kind data-dependent lowerbound on the size of DeepONets required for them to be able to reduce empirical error on noisy data. In particular, we show that for low training errors to be obtained on $n$ data points it is necessary that the common output dimension of the branch and the trunk net be scaling as $Ω\left ( \sqrt[\leftroot{-1}\uproot{-1}4]{n} \right )$. This inspires our experiments with DeepONets solving the advection-diffusion-reaction PDE, where we demonstrate the possibility that at a fixed model size, to leverage increase in this common output dimension and get monotonic lowering of training error, the size of the training data might necessarily need to scale at least quadratically with it.

Size Lowerbounds for Deep Operator Networks

TL;DR

must satisfy

under general conditions, linking architecture size to available training data. Specializing to sigmoid-ended networks and conducting ADR PDE experiments, it shows that increasing

to reduce training error at fixed model size requires roughly quadratic growth in the data to maintain improvement, revealing a scaling law for DeepONets. These results inform practical design choices for neural operators and highlight directions for extending theory to PDE-specific structures and more refined bounds.

Abstract

data points it is necessary that the common output dimension of the branch and the trunk net be scaling as

. This inspires our experiments with DeepONets solving the advection-diffusion-reaction PDE, where we demonstrate the possibility that at a fixed model size, to leverage increase in this common output dimension and get monotonic lowering of training error, the size of the training data might necessarily need to scale at least quadratically with it.

Paper Structure (24 sections, 10 theorems, 62 equations, 8 figures, 3 tables)

This paper contains 24 sections, 10 theorems, 62 equations, 8 figures, 3 tables.

Introduction
The Formal Setup of DeepONets
Review of the Universal Approximation Property of DeepONets
Related Works
Organization
Our Setup
An Example of a DeepONet Loss
The Main Theorem
Lemmas Towards Proving Theorem \ref{['thm:mainhoeffding']}
Proofs of the Lemmas
Proof of Lemma \ref{['lemma 1']}
Proof of Lemma \ref{['lemma 3']}
Proof of Lemma \ref{['lemma 2']}
Proof of Lemma \ref{['lemma 4']}
Proof of the (Main)Theorem \ref{['thm:mainhoeffding']}
...and 9 more sections

Key Result

Theorem 1.1

Suppose one considers a DeepONet function class at a fixed bound on the weights and the total number of parameters and both the branch and the trunk nets ending in a layer of sigmoid gates. Then with high probability over sampling a $n-$sized training data set, if this class has to have a predictor

Figures (8)

Figure 1: A Sketch of the DeepONet Architecture
Figure 2: Training Loss vs Epoch in fixed $\frac{q}{\sqrt{n}}$ setting
Figure 3: Training Loss vs Epoch in fixed $\frac{q}{n^{\frac{2}{3}}}$ setting
Figure 4: ($D$ & $k$ value as $1$) Left: Training Loss vs Epoch in fixed $\frac{q}{\sqrt{n}}$ setting. Right: Training Loss vs Epoch in fixed $\frac{q}{n^{\frac{2}{3}}}$ setting.
Figure 5: ($D$ & $k$ value as $0.1$) Left: Training Loss vs Epoch in fixed $\frac{q}{\sqrt{n}}$ setting. Right: Training Loss vs Epoch in fixed $\frac{q}{n^{\frac{2}{3}}}$ setting.
...and 3 more figures

Theorems & Definitions (18)

Theorem 1.1: Informal Statement of Theorem \ref{['thm:DNN']}
Theorem 1.2
Theorem 2.1
Definition 1
Definition 2
Definition 3
Lemma 3.1
Definition 4: Defining $J$
Theorem 4.1
Theorem 4.2
...and 8 more

Size Lowerbounds for Deep Operator Networks

TL;DR

Abstract

Size Lowerbounds for Deep Operator Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (18)