Table of Contents
Fetching ...

Theory-to-Practice Gap for Neural Networks and Neural Operators

Philipp Grohs, Samuel Lanthaler, Margaret Trautner

TL;DR

This work establishes a unified theory-to-practice gap framework for learning with ReLU neural networks and neural-operator architectures. In finite dimensions, it sharpens bounds on the best possible sampling-rate $\beta_\ast$ in general $L^p$-norms, showing $\beta_\ast \le \frac{1}{p} + \frac{1}{d}\cdot\frac{\alpha}{\alpha+\lfloor \bm{\ell}^*/2\rfloor}$, revealing a dimension-dependent bottleneck that persists even for large parametric rates. Extending to infinite dimensions, the paper proves that for Deep Operator Networks and integral-kernel neural operators the optimal convergence rate in Bochner $L^p$-norm is bounded by $\beta_\ast \le \frac{1}{p}$, while uniform convergence over infinite-dimensional input sets can achieve no algebraic rate ($\beta_\ast = 0$). The results apply to Fourier Neural Operators and related kernel-based operators, providing a cohesive $L^p$-theory bridging finite and infinite-dimensional learning and clarifying fundamental limits of these data-driven methods. These insights quantify the practical consequences of information constraints on learning mappings and operators in high dimensions.

Abstract

This work studies the sampling complexity of learning with ReLU neural networks and neural operators. For mappings belonging to relevant approximation spaces, we derive upper bounds on the best-possible convergence rate of any learning algorithm, with respect to the number of samples. In the finite-dimensional case, these bounds imply a gap between the parametric and sampling complexities of learning, known as the \emph{theory-to-practice gap}. In this work, a unified treatment of the theory-to-practice gap is achieved in a general $L^p$-setting, while at the same time improving available bounds in the literature. Furthermore, based on these results the theory-to-practice gap is extended to the infinite-dimensional setting of operator learning. Our results apply to Deep Operator Networks and integral kernel-based neural operators, including the Fourier neural operator. We show that the best-possible convergence rate in a Bochner $L^p$-norm is bounded by Monte-Carlo rates of order $1/p$.

Theory-to-Practice Gap for Neural Networks and Neural Operators

TL;DR

This work establishes a unified theory-to-practice gap framework for learning with ReLU neural networks and neural-operator architectures. In finite dimensions, it sharpens bounds on the best possible sampling-rate in general -norms, showing , revealing a dimension-dependent bottleneck that persists even for large parametric rates. Extending to infinite dimensions, the paper proves that for Deep Operator Networks and integral-kernel neural operators the optimal convergence rate in Bochner -norm is bounded by , while uniform convergence over infinite-dimensional input sets can achieve no algebraic rate (). The results apply to Fourier Neural Operators and related kernel-based operators, providing a cohesive -theory bridging finite and infinite-dimensional learning and clarifying fundamental limits of these data-driven methods. These insights quantify the practical consequences of information constraints on learning mappings and operators in high dimensions.

Abstract

This work studies the sampling complexity of learning with ReLU neural networks and neural operators. For mappings belonging to relevant approximation spaces, we derive upper bounds on the best-possible convergence rate of any learning algorithm, with respect to the number of samples. In the finite-dimensional case, these bounds imply a gap between the parametric and sampling complexities of learning, known as the \emph{theory-to-practice gap}. In this work, a unified treatment of the theory-to-practice gap is achieved in a general -setting, while at the same time improving available bounds in the literature. Furthermore, based on these results the theory-to-practice gap is extended to the infinite-dimensional setting of operator learning. Our results apply to Deep Operator Networks and integral kernel-based neural operators, including the Fourier neural operator. We show that the best-possible convergence rate in a Bochner -norm is bounded by Monte-Carlo rates of order .

Paper Structure

This paper contains 43 sections, 25 theorems, 212 equations.

Key Result

theorem 2.2

Let $p\in [1,\infty]$. Let $\bm{\ell}: \mathbb{N} \to \mathbb{N} \cup \{\infty\}$ be non-decreasing with $\bm{\ell}^\ast \ge 3$. Given $d\in \mathbb{N}$ and $\alpha \in (0,\infty)$, consider such that $U\subset C([0,1]^d)$. Then

Theorems & Definitions (53)

  • remark 2.1
  • theorem 2.2
  • corollary 2.3
  • remark 2.4
  • lemma 2.5: Existence of a void
  • proof
  • lemma 2.6: Localized networks
  • proof
  • lemma 2.7
  • proof
  • ...and 43 more