Table of Contents
Fetching ...

Towards Size-Independent Generalization Bounds for Deep Operator Nets

Pulkit Gopalani, Sayar Karmakar, Dibyakanti Kumar, Anirbit Mukherjee

TL;DR

This work aims to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot.

Abstract

In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.

Towards Size-Independent Generalization Bounds for Deep Operator Nets

TL;DR

This work aims to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot.

Abstract

In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.
Paper Structure (35 sections, 17 theorems, 93 equations, 7 figures)

This paper contains 35 sections, 17 theorems, 93 equations, 7 figures.

Key Result

Theorem 1

Consider a class of DeepONets, with absolute value activation, whose branch and trunk networks are both of depth $n$ (i.e. $q_B = q_T = n$ in equation def:informal) and suppose that the squared norms of the inputs to them are bounded in expectation by $M_{x,B}$ and $M_{x,T}$ respectively. Then the a where the constants $\mathcal{C}_{n,n-1}$ and $\mathcal{C}_{-k,-k}, ~k = 2,3,\ldots,n-1$ are define

Figures (7)

  • Figure 1: A Sketch of the DeepONet ( "DON") Architecture
  • Figure 2: The above plot shows the behaviour of the measured generalization error with respect to $\frac{{\mathcal{C}}_{3,2} \, \tilde{{\mathcal{C}}}_{-2,-2}}{\sqrt{m}}$ for training DeepONets, to solve the Burgers' PDE, with empirical loss as given in equation \ref{['eq:exploss']}, specialized to the Huber loss (Definition \ref{['def:huber']}) for the stated values of $\delta$ and for the branch and the trunk nets being of depth $3$. Each point is labelled by the number of training data used in that experiment.
  • Figure 3: The above plot shows the behaviour of the measured generalization error with respect to $\frac{{\mathcal{C}}_{3,2} \, \tilde{{\mathcal{C}}}_{-2,-2}}{\sqrt{m}}$ for training DeepONets, to solve the $2$D-Heat PDE, with empirical loss as given in equation \ref{['eq:exploss2']}, specialized to the Huber loss (Definition \ref{['def:huber']}) for the stated values of $\delta$ and for the branch and the trunk nets being of depth $3$. Each point is labelled by the number of training data used in that experiment.
  • Figure 4: This plot demonstrates the ability of Huber loss trained DeepONets to predict the solution to a Burgers' P.D.E. on an arbitrarily chosen inhomogeneous term $u$, which is different from the $u$'s used for training the net.
  • Figure 5: These plots demonstrate the behaviour of generalization error and test error for Huber loss trained DeepONets to predict the solution to a Burgers' P.D.E. at varying values of $\delta$ for $2$ different dataset sizes.
  • ...and 2 more figures

Theorems & Definitions (38)

  • Theorem : Informal Statement of Theorem \ref{['thm:absdon']}
  • Theorem : Informal Statement of Theorem \ref{['thm:genbound']}
  • Definition 1: A DeepONet (Version 1)
  • Definition 2
  • Definition 3: Huber Loss
  • Definition 4: A Loss Function for DeepONets
  • Definition 5: Empirical and Average Rademacher complexity
  • Theorem 4.1: Rademacher Complexity of Special Symmetric DeepONets
  • Theorem 4.2: Generalization Error Bound for DeepONet
  • Lemma 6.1
  • ...and 28 more