Towards Size-Independent Generalization Bounds for Deep Operator Nets

Pulkit Gopalani; Sayar Karmakar; Dibyakanti Kumar; Anirbit Mukherjee

Towards Size-Independent Generalization Bounds for Deep Operator Nets

Pulkit Gopalani, Sayar Karmakar, Dibyakanti Kumar, Anirbit Mukherjee

TL;DR

This work aims to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot.

Abstract

In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.

Towards Size-Independent Generalization Bounds for Deep Operator Nets

TL;DR

This work aims to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot.

Abstract

Paper Structure (35 sections, 17 theorems, 93 equations, 7 figures)

This paper contains 35 sections, 17 theorems, 93 equations, 7 figures.

Introduction
Motivations for Understanding Rademacher Complexity
Overview of Training DeepONets & Our Main Results
Choosing the Loss Function
Explaining Overparameterization
Related Works
Comparison to sidd_don
Notation
The Mathematical Setup
Results
First Main Result : Rademacher Complexity of DeepONets
Second Main Result : Size-Independent Generalization Error Bound for DeepONets Trained via a Huber Loss on Unbounded Data
An Experimental Exploration of the Proven Rademacher Complexity Bound for DeepONets
Burgers' PDE
Heat PDE
...and 20 more sections

Key Result

Theorem 1

Consider a class of DeepONets, with absolute value activation, whose branch and trunk networks are both of depth $n$ (i.e. $q_B = q_T = n$ in equation def:informal) and suppose that the squared norms of the inputs to them are bounded in expectation by $M_{x,B}$ and $M_{x,T}$ respectively. Then the a where the constants $\mathcal{C}_{n,n-1}$ and $\mathcal{C}_{-k,-k}, ~k = 2,3,\ldots,n-1$ are define

Figures (7)

Figure 1: A Sketch of the DeepONet ( "DON") Architecture
Figure 2: The above plot shows the behaviour of the measured generalization error with respect to $\frac{{\mathcal{C}}_{3,2} \, \tilde{{\mathcal{C}}}_{-2,-2}}{\sqrt{m}}$ for training DeepONets, to solve the Burgers' PDE, with empirical loss as given in equation \ref{['eq:exploss']}, specialized to the Huber loss (Definition \ref{['def:huber']}) for the stated values of $\delta$ and for the branch and the trunk nets being of depth $3$. Each point is labelled by the number of training data used in that experiment.
Figure 3: The above plot shows the behaviour of the measured generalization error with respect to $\frac{{\mathcal{C}}_{3,2} \, \tilde{{\mathcal{C}}}_{-2,-2}}{\sqrt{m}}$ for training DeepONets, to solve the $2$D-Heat PDE, with empirical loss as given in equation \ref{['eq:exploss2']}, specialized to the Huber loss (Definition \ref{['def:huber']}) for the stated values of $\delta$ and for the branch and the trunk nets being of depth $3$. Each point is labelled by the number of training data used in that experiment.
Figure 4: This plot demonstrates the ability of Huber loss trained DeepONets to predict the solution to a Burgers' P.D.E. on an arbitrarily chosen inhomogeneous term $u$, which is different from the $u$'s used for training the net.
Figure 5: These plots demonstrate the behaviour of generalization error and test error for Huber loss trained DeepONets to predict the solution to a Burgers' P.D.E. at varying values of $\delta$ for $2$ different dataset sizes.
...and 2 more figures

Theorems & Definitions (38)

Theorem : Informal Statement of Theorem \ref{['thm:absdon']}
Theorem : Informal Statement of Theorem \ref{['thm:genbound']}
Definition 1: A DeepONet (Version 1)
Definition 2
Definition 3: Huber Loss
Definition 4: A Loss Function for DeepONets
Definition 5: Empirical and Average Rademacher complexity
Theorem 4.1: Rademacher Complexity of Special Symmetric DeepONets
Theorem 4.2: Generalization Error Bound for DeepONet
Lemma 6.1
...and 28 more

Towards Size-Independent Generalization Bounds for Deep Operator Nets

TL;DR

Abstract

Towards Size-Independent Generalization Bounds for Deep Operator Nets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (38)