Table of Contents
Fetching ...

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

TL;DR

The article surveys the problem of learning smooth, high-dimensional target functions from limited data, focusing on infinite-dimensional holomorphic function classes relevant to parametric PDEs and UQ. It contrasts sparse polynomial methods with deep neural networks, deriving near-optimal learnability rates $\mathcal{O}\bigl((m/\log^4 m)^{1/2-1/p}\bigr)$ for $f\in\mathcal{H}(p,\mathsf{M})$ and developing a practical existence theory where trained DNNs achieve similar performance by emulating polynomial approximants. A central theme is addressing unknown anisotropy through weighted sparsity and anchored/lower index sets, enabling dimension-independent convergence without prior knowledge of $\bm{b}$. The practical existence framework then ties theory to practice by proposing architectures and training schemes that approximate the near-optimal rates while remaining robust to measurement and discretization errors, thereby narrowing the theory-practice gap in data-scarce, high-dimensional settings.

Abstract

Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

TL;DR

The article surveys the problem of learning smooth, high-dimensional target functions from limited data, focusing on infinite-dimensional holomorphic function classes relevant to parametric PDEs and UQ. It contrasts sparse polynomial methods with deep neural networks, deriving near-optimal learnability rates for and developing a practical existence theory where trained DNNs achieve similar performance by emulating polynomial approximants. A central theme is addressing unknown anisotropy through weighted sparsity and anchored/lower index sets, enabling dimension-independent convergence without prior knowledge of . The practical existence framework then ties theory to practice by proposing architectures and training schemes that approximate the near-optimal rates while remaining robust to measurement and discretization errors, thereby narrowing the theory-practice gap in data-scarce, high-dimensional settings.

Abstract

Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.
Paper Structure (45 sections, 8 theorems, 117 equations, 1 figure)

This paper contains 45 sections, 8 theorems, 117 equations, 1 figure.

Key Result

Theorem 4.1

Let $\bm{b} \in [0,\infty)^{\mathbb{N}}$ be such that $\bm{b} \in \ell^p(\mathbb{N})$ for some $0 < p < 1$. Then for any $s \in \mathbb{N}$ and $p \leq q \leq 2$, there exists a set $S \subset \mathcal{F}$ with $|S| \leq s$ such that for all $f \in \mathcal{H}(\bm{b})$ with coefficients $\bm{c}$ as in f-coeff.

Figures (1)

  • Figure 1: Best $s$-term approximation error in the $L^2_{\varrho}(\mathcal{U})$-norm for \ref{['f-numerics']} with $\delta_i = i^{3/2}$. This figure also shows the exponential rate "exp. rate", defined as $C_{\mathsf{exp}} \cdot \exp \left ( - \left ( s d! \prod^{d}_{i=1} \log(\rho_i) \right )^{1/d} \right )$, where $\rho_i$ is such that $(\rho_i+1/\rho_i)/2 = 1+\delta_i$, and the algebraic rate "alg. rate", defined as $C_{\mathsf{alg}} \cdot s^{-1}$. The constants $C_{\mathsf{exp}}$ and $C_{\mathsf{alg}}$ are chosen empirically to aid visualization.

Theorems & Definitions (24)

  • remark 1: Other measures and domains
  • remark 2: Error metric
  • Definition 3.1: Holomorphy
  • Definition 3.2: Holomorphic extension
  • Definition 3.3: $(\bm{b},\varepsilon)$-holomorphic functions
  • remark 3: Functions of finitely many variables
  • remark 4
  • Theorem 4.1: Algebraic convergence of the best $s$-term approximation
  • remark 5: Sharpness of the algebraic rate
  • Definition 5.1: Adaptive $m$-width
  • ...and 14 more