Universal approximation property of Banach space-valued random feature models including random neural networks
Ariel Neufeld, Philipp Schmocker
TL;DR
This work develops a Banach space-valued extension of random feature learning, enabling the approximation of infinite-dimensional objects by random feature maps that take values in a Banach space. It establishes a universal approximation theorem in Bochner $L^r$ spaces for three paradigms: random trigonometric features, random Fourier features, and random neural networks, with proofs leveraging full-support and density arguments. The authors derive approximation rates via Barron-type spaces and Banach-space types, and connect these to generalization error bounds for least-squares learning, including scenarios where random neural networks mitigate the curse of dimensionality. Numerical experiments on the heat equation and nonlinear Fokker–Planck equations illustrate empirical gains in speed and flexibility, supporting the practical relevance for high-dimensional PDE learning. Overall, the paper extends deterministic neural network universality to randomized architectures across broad function spaces and provides rigorous rates, error bounds, and actionable training procedures for Banach-space-valued learning tasks.
Abstract
We introduce a Banach space-valued extension of random feature learning, a data-driven supervised machine learning technique for large-scale kernel approximation. By randomly initializing the feature maps, only the linear readout needs to be trained, which reduces the computational complexity substantially. Viewing random feature models as Banach space-valued random variables, we prove a universal approximation result in the corresponding Bochner space. Moreover, we derive approximation rates and an explicit algorithm to learn an element of the given Banach space by such models. The framework of this paper includes random trigonometric/Fourier regression and in particular random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized, whence only the linear readout needs to be trained. For the latter, we can then lift the universal approximation property of deterministic neural networks to random neural networks, even within function spaces over non-compact domains, e.g., weighted spaces, $L^p$-spaces, and (weighted) Sobolev spaces, where the latter includes the approximation of the (weak) derivatives. In addition, we analyze when the training costs for approximating a given function grow polynomially in both the input/output dimension and the reciprocal of a pre-specified tolerated approximation error. Furthermore, we demonstrate in a numerical example the empirical advantages of random feature models over their deterministic counterparts.
