Noncompact uniform universal approximation

Teun D. H. van Nuland

Noncompact uniform universal approximation

Teun D. H. van Nuland

Abstract

The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $\mathbb{R}^n$. All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all activation functions $\varphi$ that are continuous, nonpolynomial, and asymptotically polynomial at $\pm\infty$. When $\varphi$ is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ denote the vector space of functions that are uniformly approximable by neural networks with $l$ hidden layers and $n$ inputs. For all $n$ and all $l\geq2$, $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ turns out to be an algebra under the pointwise product. If the left limit of $\varphi$ differs from its right limit (for instance, when $\varphi$ is sigmoidal) the algebra $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ ($l\geq2$) is independent of $\varphi$ and $l$, and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of $\varphi$ equals its right limit, $\overline{\mathcal{N}_\varphi^l(\mathbb{R}^n)}$ ($l\geq1$) equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of $l\geq1$, whereas in the former case $\overline{\mathcal{N}_\varphi^2(\mathbb{R}^n)}$ is strictly bigger than $\overline{\mathcal{N}_\varphi^1(\mathbb{R}^n)}$.

Noncompact uniform universal approximation

Abstract

The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space

. All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all activation functions

that are continuous, nonpolynomial, and asymptotically polynomial at

. When

is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let

denote the vector space of functions that are uniformly approximable by neural networks with

hidden layers and

inputs. For all

and all

turns out to be an algebra under the pointwise product. If the left limit of

differs from its right limit (for instance, when

is sigmoidal) the algebra

(

) is independent of

and

, and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of

equals its right limit,

(

) equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of

, whereas in the former case

is strictly bigger than

Paper Structure (11 sections, 24 theorems, 94 equations, 3 figures)

This paper contains 11 sections, 24 theorems, 94 equations, 3 figures.

Introduction
The case $\varphi(-\infty)=\varphi(\infty)$
The case $\varphi(-\infty)\neq\varphi(\infty)$
Notation and summary of main results
Approximation of continuous functions vanishing at infinity
Bounded activation functions with identical left and right limits
Bounded activation functions with distinct left and right limits
Converse inclusion
Difference between one-layer and two-layer networks
Nonzero one-layer networks do not vanish at infinity
Open questions

Key Result

Theorem 2.1

Let $n,l\in\mathbb N$, and let $\varphi\in C(\mathbb R)$ be nonlinear with $\lim_{x\to\infty}(\varphi(x)-a_1x-b_1)=0$ and $\lim_{x\to-\infty}(\varphi(x)-a_{2}x-b_{2})=0$ for certain $a_1,b_1,a_2,b_2\in\mathbb R$. Then,

Figures (3)

Figure 1: Example of a neural network in which wedge functions (cf. Definition \ref{['defn:wedge function']} and Figure \ref{['fig:two wedge functions']}) are clearly visible in the contour plot. The network has been given insufficient nodes/layers/time to fit the data at all relevant scales, and has only succeeded on the small scale. At a slightly larger scale the wedge functions already become apparent, and this paper proves that this behaviour is in fact unavoidable at sufficiently large scale. This image was produced using matlabsolutions.com/visualize-neural-network/neural-network.html.
Figure 2: First three elements of a sequence of 1-layer neural networks uniformly approximating a function in $C_0(\mathbb R^2)$. Cf. Lemma \ref{['lem:Riemann sums']}. To increase the locality of the limit function, the ridge functions $g\circ p_a$ need to satisfy $\int g(x)dx=0$, unlike what is shown in the picture. Note that $L^p$-convergence is out of the question, as each element of the sequence has infinite integral norm, cf. Pinkus.
Figure 3: Contour plot of two wedge functions on $\mathbb R^2$.

Theorems & Definitions (50)

Theorem 2.1
Theorem 2.2
Corollary 2.3
proof
Theorem 2.4
Lemma 3.1
proof
Lemma 3.2
proof
Lemma 3.3
...and 40 more

Noncompact uniform universal approximation

Abstract

Noncompact uniform universal approximation

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (50)