Table of Contents
Fetching ...

Universal approximation property of Banach space-valued random feature models including random neural networks

Ariel Neufeld, Philipp Schmocker

TL;DR

This work develops a Banach space-valued extension of random feature learning, enabling the approximation of infinite-dimensional objects by random feature maps that take values in a Banach space. It establishes a universal approximation theorem in Bochner $L^r$ spaces for three paradigms: random trigonometric features, random Fourier features, and random neural networks, with proofs leveraging full-support and density arguments. The authors derive approximation rates via Barron-type spaces and Banach-space types, and connect these to generalization error bounds for least-squares learning, including scenarios where random neural networks mitigate the curse of dimensionality. Numerical experiments on the heat equation and nonlinear Fokker–Planck equations illustrate empirical gains in speed and flexibility, supporting the practical relevance for high-dimensional PDE learning. Overall, the paper extends deterministic neural network universality to randomized architectures across broad function spaces and provides rigorous rates, error bounds, and actionable training procedures for Banach-space-valued learning tasks.

Abstract

We introduce a Banach space-valued extension of random feature learning, a data-driven supervised machine learning technique for large-scale kernel approximation. By randomly initializing the feature maps, only the linear readout needs to be trained, which reduces the computational complexity substantially. Viewing random feature models as Banach space-valued random variables, we prove a universal approximation result in the corresponding Bochner space. Moreover, we derive approximation rates and an explicit algorithm to learn an element of the given Banach space by such models. The framework of this paper includes random trigonometric/Fourier regression and in particular random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized, whence only the linear readout needs to be trained. For the latter, we can then lift the universal approximation property of deterministic neural networks to random neural networks, even within function spaces over non-compact domains, e.g., weighted spaces, $L^p$-spaces, and (weighted) Sobolev spaces, where the latter includes the approximation of the (weak) derivatives. In addition, we analyze when the training costs for approximating a given function grow polynomially in both the input/output dimension and the reciprocal of a pre-specified tolerated approximation error. Furthermore, we demonstrate in a numerical example the empirical advantages of random feature models over their deterministic counterparts.

Universal approximation property of Banach space-valued random feature models including random neural networks

TL;DR

This work develops a Banach space-valued extension of random feature learning, enabling the approximation of infinite-dimensional objects by random feature maps that take values in a Banach space. It establishes a universal approximation theorem in Bochner spaces for three paradigms: random trigonometric features, random Fourier features, and random neural networks, with proofs leveraging full-support and density arguments. The authors derive approximation rates via Barron-type spaces and Banach-space types, and connect these to generalization error bounds for least-squares learning, including scenarios where random neural networks mitigate the curse of dimensionality. Numerical experiments on the heat equation and nonlinear Fokker–Planck equations illustrate empirical gains in speed and flexibility, supporting the practical relevance for high-dimensional PDE learning. Overall, the paper extends deterministic neural network universality to randomized architectures across broad function spaces and provides rigorous rates, error bounds, and actionable training procedures for Banach-space-valued learning tasks.

Abstract

We introduce a Banach space-valued extension of random feature learning, a data-driven supervised machine learning technique for large-scale kernel approximation. By randomly initializing the feature maps, only the linear readout needs to be trained, which reduces the computational complexity substantially. Viewing random feature models as Banach space-valued random variables, we prove a universal approximation result in the corresponding Bochner space. Moreover, we derive approximation rates and an explicit algorithm to learn an element of the given Banach space by such models. The framework of this paper includes random trigonometric/Fourier regression and in particular random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized, whence only the linear readout needs to be trained. For the latter, we can then lift the universal approximation property of deterministic neural networks to random neural networks, even within function spaces over non-compact domains, e.g., weighted spaces, -spaces, and (weighted) Sobolev spaces, where the latter includes the approximation of the (weak) derivatives. In addition, we analyze when the training costs for approximating a given function grow polynomially in both the input/output dimension and the reciprocal of a pre-specified tolerated approximation error. Furthermore, we demonstrate in a numerical example the empirical advantages of random feature models over their deterministic counterparts.
Paper Structure (35 sections, 22 theorems, 148 equations, 12 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 22 theorems, 148 equations, 12 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.2

Let Assumption AssCDF hold and let $\mathcal{G} \subseteq C^0(\Theta;X)$ such that $\mathop{\mathrm{span}}\nolimits_{\mathbb{K}}(\mathcal{G}(\Theta))$ is dense in $X$. Moreover, let $F \in L^r(\Omega,\mathcal{F}_\theta,\mathbb{P};X)$ for some $r \in [1,\infty)$. Then, for every $\varepsilon > 0$ the

Figures (12)

  • Figure 1: Empirical $L^2$-error defined in \ref{['EqDefMSE']}.
  • Figure 2: Approximation of the function $\mathbb{R} \ni u_1 \mapsto f(1,(u_1,0.4,...,0.4)) \in \mathbb{R}$.
  • Figure 4: Empirical $L^2$-error defined in \ref{['EqDefMSE2']}.
  • Figure 5: Approximation of the function $[0,T] \times \mathbb{R} \ni (t,u_1) \mapsto f(t,(u_1,0.4,...,0.4)) \in \mathbb{R}$ for $m = 10$.
  • Figure 6: Approximation of the function $[0,T] \times \mathbb{R} \ni (t,u_1) \mapsto f(t,(u_1,0.4,...,0.4)) \in \mathbb{R}$ for $m = 20$.
  • ...and 7 more figures

Theorems & Definitions (63)

  • Definition 2.1
  • Remark 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Theorem 3.2: Universal approximation
  • Corollary 3.3: Universal approximation
  • Corollary 3.4: Universal approximation
  • Example 3.6: neufeld24
  • Corollary 3.8: Universal approximation
  • ...and 53 more