Table of Contents
Fetching ...

Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks

Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

TL;DR

This work develops a theoretical framework showing that deep neural networks can near-optimally learn infinite-dimensional, Banach-valued, holomorphic functions from limited, noisy data. By connecting DNNs with polynomial emulation and Banach-valued compressed sensing, the authors obtain practical existence theorems that yield dimension-free architectures and provable error bounds, accounting for discretization, sampling, and optimization errors. They address both unknown and known anisotropy and provide explicit rates that are near-optimal, especially in the Hilbert-valued setting, while extending prior scalar/Hilbert results to the Banach-valued regime. The results substantiate the use of DL for high-dimensional parametric PDE problems and lay out a rigorous, nonintrusive alternative to classical methods, with clear directions for future improvements and extensions to operator learning and mesh-invariant schemes.

Abstract

The past decade has seen increasing interest in applying Deep Learning (DL) to Computational Science and Engineering (CSE). Driven by impressive results in applications such as computer vision, Uncertainty Quantification (UQ), genetics, simulations and image processing, DL is increasingly supplanting classical algorithms, and seems poised to revolutionize scientific computing. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and sample complexity. In particular, approximating solutions to parametric PDEs is an objective of UQ for CSE. Training data for such problems is often scarce and corrupted by errors. Moreover, the target function is a possibly infinite-dimensional smooth function taking values in the PDE solution space, generally an infinite-dimensional Banach space. This paper provides arguments for Deep Neural Network (DNN) approximation of such functions, with both known and unknown parametric dependence, that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture size and training procedures based on minimizing the (regularized) $\ell^2$-loss which achieve near-optimal algebraic rates of convergence. These results involve key extensions of compressed sensing for Banach-valued recovery and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data. Our theoretical results fall into the category of non-intrusive methods, providing a theoretical alternative to classical methods for high-dimensional approximation.

Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks

TL;DR

This work develops a theoretical framework showing that deep neural networks can near-optimally learn infinite-dimensional, Banach-valued, holomorphic functions from limited, noisy data. By connecting DNNs with polynomial emulation and Banach-valued compressed sensing, the authors obtain practical existence theorems that yield dimension-free architectures and provable error bounds, accounting for discretization, sampling, and optimization errors. They address both unknown and known anisotropy and provide explicit rates that are near-optimal, especially in the Hilbert-valued setting, while extending prior scalar/Hilbert results to the Banach-valued regime. The results substantiate the use of DL for high-dimensional parametric PDE problems and lay out a rigorous, nonintrusive alternative to classical methods, with clear directions for future improvements and extensions to operator learning and mesh-invariant schemes.

Abstract

The past decade has seen increasing interest in applying Deep Learning (DL) to Computational Science and Engineering (CSE). Driven by impressive results in applications such as computer vision, Uncertainty Quantification (UQ), genetics, simulations and image processing, DL is increasingly supplanting classical algorithms, and seems poised to revolutionize scientific computing. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and sample complexity. In particular, approximating solutions to parametric PDEs is an objective of UQ for CSE. Training data for such problems is often scarce and corrupted by errors. Moreover, the target function is a possibly infinite-dimensional smooth function taking values in the PDE solution space, generally an infinite-dimensional Banach space. This paper provides arguments for Deep Neural Network (DNN) approximation of such functions, with both known and unknown parametric dependence, that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture size and training procedures based on minimizing the (regularized) -loss which achieve near-optimal algebraic rates of convergence. These results involve key extensions of compressed sensing for Banach-valued recovery and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data. Our theoretical results fall into the category of non-intrusive methods, providing a theoretical alternative to classical methods for high-dimensional approximation.
Paper Structure (33 sections, 16 theorems, 284 equations)

This paper contains 33 sections, 16 theorems, 284 equations.

Key Result

Theorem 4.1

There are universal constants $c_0$, $c_1 \geq 1$ such that the following holds. Let $m\geq 3$, $0 < \epsilon < 1$, $0<p \leq 1/2$, $\varepsilon > 0$, $\varrho$ be either the uniform or Chebyshev probability measure over $\mathcal{U} = [-1,1]^{\mathbb{N}}$, $\mathcal{V}$ be a Banach space, $\mathcal and Then there exist such that the following holds for every $\bm{b} \in \ell^{p}_{\mathsf{M}}(\m

Theorems & Definitions (34)

  • remark 1
  • remark 2: Other activation functions
  • remark 3
  • Theorem 4.1: Banach-valued learning; unknown anisotropy
  • Theorem 4.2: Hilbert-valued learning; unknown anisotropy
  • Theorem 4.3: Banach-valued learning; known anisotropy
  • Theorem 4.4: Hilbert-valued learning; known anisotropy
  • Lemma 5.1
  • proof
  • Definition 5.2
  • ...and 24 more