Table of Contents
Fetching ...

An elementary proof of a universal approximation theorem

Chris Monico

TL;DR

The paper proves an elementary universal approximation theorem for neural networks with three hidden layers and a 0-1 squashing activation on a compact domain. It proceeds via three strong separation lemmas that progressively separate points, points from closed sets, and disjoint closed sets, to show that the fourth-layer class $\mathcal{N}_4$ is dense in $C(K)$ under the uniform norm. Although weaker than classical results, the method relies only undergraduate analysis, providing a transparent, epsilon-chasing construction. The work also notes corollaries and extensions to vector-valued targets and potential reductions to two hidden layers, highlighting the accessibility of the approach.

Abstract

In this short note, we give an elementary proof of a universal approximation theorem for neural networks with three hidden layers and increasing, continuous, bounded activation function. The result is weaker than the best known results, but the proof is elementary in the sense that no machinery beyond undergraduate analysis is used.

An elementary proof of a universal approximation theorem

TL;DR

The paper proves an elementary universal approximation theorem for neural networks with three hidden layers and a 0-1 squashing activation on a compact domain. It proceeds via three strong separation lemmas that progressively separate points, points from closed sets, and disjoint closed sets, to show that the fourth-layer class is dense in under the uniform norm. Although weaker than classical results, the method relies only undergraduate analysis, providing a transparent, epsilon-chasing construction. The work also notes corollaries and extensions to vector-valued targets and potential reductions to two hidden layers, highlighting the accessibility of the approach.

Abstract

In this short note, we give an elementary proof of a universal approximation theorem for neural networks with three hidden layers and increasing, continuous, bounded activation function. The result is weaker than the best known results, but the proof is elementary in the sense that no machinery beyond undergraduate analysis is used.
Paper Structure (4 sections, 4 theorems, 3 equations)

This paper contains 4 sections, 4 theorems, 3 equations.

Key Result

Lemma 3.1

Let $x_0$ and $x_1$ be distinct real numbers. For each $\epsilon>0$ there exist $s,t\in\mathbb{R}$ such that $\sigma(s+tx_0)<\epsilon$ and $\sigma(s+tx_1)>1-\epsilon$. If, in addition, $x_0<x_1$ and $\epsilon<1/2$, then $\sigma(s+tx)<\epsilon$ on the interval $(-\infty, x_0]$ and $\sigma(s+tx)>1-\ep

Theorems & Definitions (4)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.4: A Universal Approximation Theorem