Table of Contents
Fetching ...

Optimal deep learning of holomorphic operators between Banach spaces

Ben Adcock, Nick Dexter, Sebastian Moraga

TL;DR

This work tackles the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces, and focuses on learning holomorphic operators - an important class of problems with many applications.

Abstract

Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard $\ell^2$-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data $m$ and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs.

Optimal deep learning of holomorphic operators between Banach spaces

TL;DR

This work tackles the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces, and focuses on learning holomorphic operators - an important class of problems with many applications.

Abstract

Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard -loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs.
Paper Structure (53 sections, 26 theorems, 358 equations, 9 figures)

This paper contains 53 sections, 26 theorems, 358 equations, 9 figures.

Key Result

Theorem 3.1

Let $m \geq 3$, $\delta > 0$, $0 < \epsilon < 1$ and $L = L(m,\epsilon) = \log^4(m) + \log(1/\epsilon)$. Then there exists a class $\mathcal{N}$ of hyperbolic tangent (tanh) DNNs $N : \mathbb{R}^{d_{\mathcal{X}}} \rightarrow \mathbb{R}^{d_{\mathcal{Y}}}$ depending on $m$ and $\epsilon$ only with such that following holds. Suppose that Assumption ass:main-ass holds and where $\mathcal{I}_{\mathc

Figures (9)

  • Figure 1: Elliptic diffusion equation. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the solution operator for the elliptic diffusion equation \ref{['eq:Poisson']}. The first two plots use the affine coefficient $a_{1,d}$\ref{['eq:affine_param_p2']} with $d=4,8$, respectively. The rest use the log-transformed coefficient $a_{2,d}$\ref{['eq:affine_param_p5']}.
  • Figure 2: NSB equations. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the velocity field $\bm{u}$ of the NSB problem in \ref{['eq:NSB']}. See Fig. \ref{['NSB_res-additional']} for results for the pressure component $p$. The diffusion coefficients $a_{1,d},a_{2,d}$ and $d = 4,8$ are as in Fig. \ref{['Poisson_res']}.
  • Figure 3: Boussinesq equation. Average relative $L^2_{\mu}(\mathcal{X} ; \widetilde{\mathcal{Y}})$-norm error versus $m$ for different DNNs approximating the temperature $\varphi$ of the Boussinesq problem in \ref{['eq:BSQ']} (see Fig. \ref{['Boussinesq_res-additional']} for $\bm{u}$ and $p$). The diffusion coefficients $a_{1,d},a_{2,d}$ and $d = 4,8$ are as in Fig. \ref{['Poisson_res']}. In this example, we also consider an additional parametric dependence in the tensor $\mathbb{K} = \mathbb{K}_d$ describing the thermal conductivity of the fluid. See § \ref{['app:parametric_Boussinesq']} and \ref{['eq:affine_param_p3']}.
  • Figure 4: The domain $\Omega$ and FE mesh for the parametric diffusion equation.
  • Figure 5: The solution $\bm{u}(\bm{x})$ of the parametric Poisson problem in \ref{['eq:Poisson']} for a given parameter $\bm{x}=(1,0,0,0)^{\top}$ with affine coefficient $a_{1,d}$ and $d=4$, using a total of $K=2622$ DoF. The left plot shows the solution given by the FEM solver. The right plot show the ELU $4\times 40$ DNN approximation after $60,000$ epochs of training with $m=500$ sample points for training.
  • ...and 4 more figures

Theorems & Definitions (61)

  • Definition 2.1: Holomorphic map
  • remark 1: Holomorphy assumption
  • Theorem 3.1: Existence of good DNN architectures
  • Theorem 3.2: Fully-connected DNN architectures are good
  • Theorem 4.1: Optimal $L^2$ error rates
  • Theorem 4.2: Optimal $L^{\infty}$ error rates
  • remark 2: Other auxiliary variables
  • remark 3
  • remark 4
  • Lemma D.1: Nikolskii inequality for polynomials
  • ...and 51 more