Table of Contents
Fetching ...

Noncompact uniform universal approximation

Teun D. H. van Nuland

Abstract

The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $\mathbb{R}^n$. All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all activation functions $\varphi$ that are continuous, nonpolynomial, and asymptotically polynomial at $\pm\infty$. When $\varphi$ is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ denote the vector space of functions that are uniformly approximable by neural networks with $l$ hidden layers and $n$ inputs. For all $n$ and all $l\geq2$, $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ turns out to be an algebra under the pointwise product. If the left limit of $\varphi$ differs from its right limit (for instance, when $\varphi$ is sigmoidal) the algebra $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ ($l\geq2$) is independent of $\varphi$ and $l$, and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of $\varphi$ equals its right limit, $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ ($l\geq1$) equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of $l\geq1$, whereas in the former case $\overline{\mathcal{N}_\varphi^2(\mathbb{R}^n)}$ is strictly bigger than $\overline{\mathcal{N}_\varphi^1(\mathbb{R}^n)}$.

Noncompact uniform universal approximation

Abstract

The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space . All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all activation functions that are continuous, nonpolynomial, and asymptotically polynomial at . When is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let denote the vector space of functions that are uniformly approximable by neural networks with hidden layers and inputs. For all and all , turns out to be an algebra under the pointwise product. If the left limit of differs from its right limit (for instance, when is sigmoidal) the algebra () is independent of and , and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of equals its right limit, () equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of , whereas in the former case is strictly bigger than .
Paper Structure (11 sections, 24 theorems, 94 equations, 3 figures)

This paper contains 11 sections, 24 theorems, 94 equations, 3 figures.

Key Result

Theorem 2.1

Let $n,l\in\mathbb N$, and let $\varphi\in C(\mathbb R)$ be nonlinear with $\lim_{x\to\infty}(\varphi(x)-a_1x-b_1)=0$ and $\lim_{x\to-\infty}(\varphi(x)-a_{2}x-b_{2})=0$ for certain $a_1,b_1,a_2,b_2\in\mathbb R$. Then,

Figures (3)

  • Figure 1: Example of a neural network in which wedge functions (cf. Definition \ref{['defn:wedge function']} and Figure \ref{['fig:two wedge functions']}) are clearly visible in the contour plot. The network has been given insufficient nodes/layers/time to fit the data at all relevant scales, and has only succeeded on the small scale. At a slightly larger scale the wedge functions already become apparent, and this paper proves that this behaviour is in fact unavoidable at sufficiently large scale. This image was produced using matlabsolutions.com/visualize-neural-network/neural-network.html.
  • Figure 2: First three elements of a sequence of 1-layer neural networks uniformly approximating a function in $C_0(\mathbb R^2)$. Cf. Lemma \ref{['lem:Riemann sums']}. To increase the locality of the limit function, the ridge functions $g\circ p_a$ need to satisfy $\int g(x)dx=0$, unlike what is shown in the picture. Note that $L^p$-convergence is out of the question, as each element of the sequence has infinite integral norm, cf. Pinkus.
  • Figure 3: Contour plot of two wedge functions on $\mathbb R^2$.

Theorems & Definitions (50)

  • Theorem 2.1
  • Theorem 2.2
  • Corollary 2.3
  • proof
  • Theorem 2.4
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • ...and 40 more