Table of Contents
Fetching ...

Minimum width for universal approximation using ReLU networks on compact domain

Namjun Kim, Chanho Min, Sejun Park

TL;DR

The paper resolves the exact minimum width needed for universal approximation by ReLU networks on a compact domain, showing $w_{ m min}=\max\{d_x,d_y,2\}$ for $L^p([0,1]^{d_x},\mathbb{R}^{d_y})$ and extending this to ReLU-like activations; it also proves a lower bound $w_{ m min}\ge d_y+1$ for uniform approximation when $d_x<d_y\le2d_x$, revealing a dichotomy between $L^p$ and uniform regimes. The key technique is a coding-based upper bound via an encoder–decoder construction that achieves the tight width $\max\{d_x,d_y,2\}$, contrasted with a topological argument for the uniform-case lower bound. The results clarify how compact vs unbounded domains affect the width needed for universal approximation and suggest extensions to RNNs and BRNNs. Overall, the work sharpens our understanding of the expressive power of shallow vs deep width-bounded networks and their activation-function dependencies.

Abstract

It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for $L^p$ approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.

Minimum width for universal approximation using ReLU networks on compact domain

TL;DR

The paper resolves the exact minimum width needed for universal approximation by ReLU networks on a compact domain, showing for and extending this to ReLU-like activations; it also proves a lower bound for uniform approximation when , revealing a dichotomy between and uniform regimes. The key technique is a coding-based upper bound via an encoder–decoder construction that achieves the tight width , contrasted with a topological argument for the uniform-case lower bound. The results clarify how compact vs unbounded domains affect the width needed for universal approximation and suggest extensions to RNNs and BRNNs. Overall, the work sharpens our understanding of the expressive power of shallow vs deep width-bounded networks and their activation-function dependencies.

Abstract

It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for approximation of functions from to is exactly if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, when the domain is , our result first shows that approximation on a compact domain requires smaller width than on . We next prove a lower bound on for uniform approximation using general activation functions including ReLU: if . Together with our first result, this shows a dichotomy between and uniform approximations for general activation functions and input/output dimensions.
Paper Structure (39 sections, 29 theorems, 104 equations, 4 figures, 2 tables)

This paper contains 39 sections, 29 theorems, 104 equations, 4 figures, 2 tables.

Key Result

Theorem 1

$w_{\min}=\max\{d_x, d_y, 2\}$ for $\textsc{ReLU}$ networks to be dense in $L^p([0,1]^{d_x},\mathbb{R}^{d_y}\!)$.

Figures (4)

  • Figure 1: Illustration of our encoder and decoder when $d_x=2$, $d_y=1$ and $k=4$. Our encoder $g$ first maps each element of $\{\mathcal{T}_1,\dots,\mathcal{T}_4\}$ to distinct scalar codewords $u_1,\dots,u_4$. Then, the decoder $h$ maps each codeword $u_i$ to $h(u_i)\approx f^*(z_i)$ for some $z_i \in \mathcal{T}_i$.
  • Figure 2: Construction of $f$ in \ref{['lem:tool1']}. (a) $f$ preserves points in the half-space $\mathcal{H}^+$ represented by the gray area and projects points outside of $\mathcal{H}^+$ to the boundary of $\mathcal{H}^+$. (b) Illustrations of mapping $\mathcal{T}$ to a single point disjoint to $\mathcal{P} \cap \mathcal{H}_1^+$ when $d_x=2$: $f_1$ maps $\mathcal{T}$ onto the boundary of $\mathcal{H}_1^+$ and then $f_2$ maps $f_1(\mathcal{T})$ to the point $f_2(f_1(\mathcal{T}))$ while preserving points in $\mathcal{P} \cap \mathcal{H}_1^+$.
  • Figure 3: Illustration of the encoder when $d_x=2$ and $k=4$. For each partition $\mathcal{S}_i$, the encoder maps $\mathcal{T}_i\subset\mathcal{S}_i$ to some point ${u_i}\notin\{u_1,\dots,u_{i-1}\}\cup\mathcal{S}_{i+1}\cup\cdots\cup\mathcal{S}_k$. After that, the encoder maps $u_1,\dots,u_k$ to some distinct scalar values by projecting them.
  • Figure 4: $\mathcal{D}_1,\mathcal{D}_2$ and their corresponding images of $f^*$ when $d_x=2$ and $d_y=3$ are illustrated by the grey squares and red lines in (a) and (b). One of the possible images of $f(\mathcal{D}_1)$ and $f(\mathcal{D}_2)$ are represented by the grey surface and red curve in (c).

Theorems & Definitions (39)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9: Brouwer's fixed-point theorem brouwer
  • Lemma 10
  • ...and 29 more