Optimal deep learning of holomorphic operators between Banach spaces

Ben Adcock; Nick Dexter; Sebastian Moraga

Optimal deep learning of holomorphic operators between Banach spaces

Ben Adcock, Nick Dexter, Sebastian Moraga

TL;DR

This work tackles the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces, and focuses on learning holomorphic operators - an important class of problems with many applications.

Abstract

Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard $\ell^2$-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data $m$ and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs.

Optimal deep learning of holomorphic operators between Banach spaces

TL;DR

Abstract

-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data

and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs.

Paper Structure (53 sections, 26 theorems, 358 equations, 9 figures)

This paper contains 53 sections, 26 theorems, 358 equations, 9 figures.

Introduction
Contributions
Relation to previous work
Notation, assumptions, setup and examples
Notation
Assumptions and setup
Discussion of assumptions
Main results I: upper bounds
Main results II: lower bounds
Numerical experiments
Conclusions and limitations
Experimental setup
Formulation of the learning problems
Computational setup for the numerical experiments
Description of the parametric PDEs used in the numerical experiments and their discretization
...and 38 more sections

Key Result

Theorem 3.1

Let $m \geq 3$, $\delta > 0$, $0 < \epsilon < 1$ and $L = L(m,\epsilon) = \log^4(m) + \log(1/\epsilon)$. Then there exists a class $\mathcal{N}$ of hyperbolic tangent (tanh) DNNs $N : \mathbb{R}^{d_{\mathcal{X}}} \rightarrow \mathbb{R}^{d_{\mathcal{Y}}}$ depending on $m$ and $\epsilon$ only with such that following holds. Suppose that Assumption ass:main-ass holds and where $\mathcal{I}_{\mathc

Figures (9)

Figure 1: Elliptic diffusion equation. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the solution operator for the elliptic diffusion equation \ref{['eq:Poisson']}. The first two plots use the affine coefficient $a_{1,d}$\ref{['eq:affine_param_p2']} with $d=4,8$, respectively. The rest use the log-transformed coefficient $a_{2,d}$\ref{['eq:affine_param_p5']}.
Figure 2: NSB equations. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the velocity field $\bm{u}$ of the NSB problem in \ref{['eq:NSB']}. See Fig. \ref{['NSB_res-additional']} for results for the pressure component $p$. The diffusion coefficients $a_{1,d},a_{2,d}$ and $d = 4,8$ are as in Fig. \ref{['Poisson_res']}.
Figure 3: Boussinesq equation. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the temperature $\varphi$ of the Boussinesq problem in \ref{['eq:BSQ']} (see Fig. \ref{['Boussinesq_res-additional']} for $\bm{u}$ and $p$). The diffusion coefficients $a_{1,d},a_{2,d}$ and $d = 4,8$ are as in Fig. \ref{['Poisson_res']}. In this example, we also consider an additional parametric dependence in the tensor $\mathbb{K} = \mathbb{K}_d$ describing the thermal conductivity of the fluid. See § \ref{['app:parametric_Boussinesq']} and \ref{['eq:affine_param_p3']}.
Figure 4: The domain $\Omega$ and FE mesh for the parametric diffusion equation.
Figure 5: The solution $\bm{u}(\bm{x})$ of the parametric Poisson problem in \ref{['eq:Poisson']} for a given parameter $\bm{x}=(1,0,0,0)^{\top}$ with affine coefficient $a_{1,d}$ and $d=4$, using a total of $K=2622$ DoF. The left plot shows the solution given by the FEM solver. The right plot show the ELU $4\times 40$ DNN approximation after $60,000$ epochs of training with $m=500$ sample points for training.
...and 4 more figures

Theorems & Definitions (61)

Definition 2.1: Holomorphic map
remark 1: Holomorphy assumption
Theorem 3.1: Existence of good DNN architectures
Theorem 3.2: Fully-connected DNN architectures are good
Theorem 4.1: Optimal $L^2$ error rates
Theorem 4.2: Optimal $L^{\infty}$ error rates
remark 2: Other auxiliary variables
remark 3
remark 4
Lemma D.1: Nikolskii inequality for polynomials
...and 51 more

Optimal deep learning of holomorphic operators between Banach spaces

TL;DR

Abstract

Optimal deep learning of holomorphic operators between Banach spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (61)