Table of Contents
Fetching ...

A Survey on Universal Approximation Theorems

Midhun T Augustine

TL;DR

This survey analyzes universal approximation theorems (UATs) for feedforward neural networks, tracing results from early function-approximation theory (Taylor, Fourier, Weierstrass, Kolmogorov–Arnold) to modern NN-based density results in spaces like $\mathcal{C}(\mathbb{X})$ and $L^p$. It differentiates between arbitrary width (bounded depth) and arbitrary depth (bounded width) frameworks, highlighting that nonpolynomial activations render shallow networks universal and establishing width thresholds (e.g., $W\le n+4$ and $W^* = \max\{n+1, m\}$) for universal approximation. The paper also documents historical milestones (Cun, Lapedes–Farber, Cybenko, Hornik, Leshno) and clarifies how depth contributes to expressivity beyond width limitations, with implications for architecture design. By connecting classical approximation theory to NN expressivity and outlining extensions to other architectures, the work provides a consolidated reference for researchers assessing the theoretical capabilities of neural networks in real-world settings.

Abstract

This paper discusses various theorems on the approximation capabilities of neural networks (NNs), which are known as universal approximation theorems (UATs). The paper gives a systematic overview of UATs starting from the preliminary results on function approximation, such as Taylor's theorem, Fourier's theorem, Weierstrass approximation theorem, Kolmogorov - Arnold representation theorem, etc. Theoretical and numerical aspects of UATs are covered from both arbitrary width and depth.

A Survey on Universal Approximation Theorems

TL;DR

This survey analyzes universal approximation theorems (UATs) for feedforward neural networks, tracing results from early function-approximation theory (Taylor, Fourier, Weierstrass, Kolmogorov–Arnold) to modern NN-based density results in spaces like and . It differentiates between arbitrary width (bounded depth) and arbitrary depth (bounded width) frameworks, highlighting that nonpolynomial activations render shallow networks universal and establishing width thresholds (e.g., and ) for universal approximation. The paper also documents historical milestones (Cun, Lapedes–Farber, Cybenko, Hornik, Leshno) and clarifies how depth contributes to expressivity beyond width limitations, with implications for architecture design. By connecting classical approximation theory to NN expressivity and outlining extensions to other architectures, the work provides a consolidated reference for researchers assessing the theoretical capabilities of neural networks in real-world settings.

Abstract

This paper discusses various theorems on the approximation capabilities of neural networks (NNs), which are known as universal approximation theorems (UATs). The paper gives a systematic overview of UATs starting from the preliminary results on function approximation, such as Taylor's theorem, Fourier's theorem, Weierstrass approximation theorem, Kolmogorov - Arnold representation theorem, etc. Theoretical and numerical aspects of UATs are covered from both arbitrary width and depth.
Paper Structure (7 sections, 10 theorems, 17 equations, 6 figures)

This paper contains 7 sections, 10 theorems, 17 equations, 6 figures.

Key Result

Theorem 1

Any continuous function $f(x):\mathbb{R} \rightarrow \mathbb{R}$ that is $k-$ times differentiable at $a\in \mathbb{R}$ can be represented as a sum of polynomials: where $c_{i}=\frac{f^{i}(a)}{i!}= \frac{1}{i!} \frac{d^{i}}{dx^i}f(x)|_{x=a}$ and $R_{k}(x)=o(|x-a|^{k})$ is the residual term.

Figures (6)

  • Figure 1: (a) Neural Network (b) Neuron.
  • Figure 2: Graph of activation functions: (a) ReLU (b) Step (c) Logistic (d) Tanh.
  • Figure 3: Illustrating NN.
  • Figure 4: (a) NN with arbitrary width (b) NN with arbitrary depth.
  • Figure 5: Output of NNs with one hidden layer and ReLU activation function.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Theorem 1: Taylor, 1715
  • Theorem 2: Fourier, 1807
  • Theorem 3: Weierstrass, 1885
  • Theorem 4: Kolmogorov and Arnold, 1959
  • Theorem 5: Pascanu et al., 2013
  • Theorem 6: Funahashi, Hornick et al., and Cybenko, 1989
  • Theorem 7: Leshno et al., 1993
  • Theorem 8: Lu et al., 2017
  • Theorem 9: Lu et al., 2017
  • Theorem 10: Park et al., 2021