Minimum width for universal approximation using ReLU networks on compact domain

Namjun Kim; Chanho Min; Sejun Park

Minimum width for universal approximation using ReLU networks on compact domain

Namjun Kim, Chanho Min, Sejun Park

TL;DR

The paper resolves the exact minimum width needed for universal approximation by ReLU networks on a compact domain, showing $w_{ m min}=\max\{d_x,d_y,2\}$ for $L^p([0,1]^{d_x},\mathbb{R}^{d_y})$ and extending this to ReLU-like activations; it also proves a lower bound $w_{ m min}\ge d_y+1$ for uniform approximation when $d_x<d_y\le2d_x$, revealing a dichotomy between $L^p$ and uniform regimes. The key technique is a coding-based upper bound via an encoder–decoder construction that achieves the tight width $\max\{d_x,d_y,2\}$, contrasted with a topological argument for the uniform-case lower bound. The results clarify how compact vs unbounded domains affect the width needed for universal approximation and suggest extensions to RNNs and BRNNs. Overall, the work sharpens our understanding of the expressive power of shallow vs deep width-bounded networks and their activation-function dependencies.

Abstract

It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for $L^p$ approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.

Minimum width for universal approximation using ReLU networks on compact domain

TL;DR

The paper resolves the exact minimum width needed for universal approximation by ReLU networks on a compact domain, showing

for

and extending this to ReLU-like activations; it also proves a lower bound

for uniform approximation when

, revealing a dichotomy between

and uniform regimes. The key technique is a coding-based upper bound via an encoder–decoder construction that achieves the tight width

, contrasted with a topological argument for the uniform-case lower bound. The results clarify how compact vs unbounded domains affect the width needed for universal approximation and suggest extensions to RNNs and BRNNs. Overall, the work sharpens our understanding of the expressive power of shallow vs deep width-bounded networks and their activation-function dependencies.

Abstract

enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for

approximation of

functions from

is exactly

if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks,

when the domain is

, our result first shows that approximation on a compact domain requires smaller width than on

. We next prove a lower bound on

for uniform approximation using general activation functions including ReLU:

. Together with our first result, this shows a dichotomy between

and uniform approximations for general activation functions and input/output dimensions.

Paper Structure (39 sections, 29 theorems, 104 equations, 4 figures, 2 tables)

This paper contains 39 sections, 29 theorems, 104 equations, 4 figures, 2 tables.

Introduction
Related works
Summary of results
Organization
Problem setup and notation
Main results
Tight upper bound on minimum width for $L^p$-approximation
Coding scheme and $\textsc{ReLU}$ network implementation (proof of \ref{['lem:ub-lp']})
Approximating encoder using $\textsc{ReLU}$ network (proof sketch of \ref{['lem:encoder']})
Lower bound on minimum width for uniform approximation
Conclusion
Definition of activation functions
Proof of upper bound in \ref{['thm:lp-ub']}
Additional notations
Our choices of $\alpha,\beta,\gamma$
...and 24 more sections

Key Result

Theorem 1

$w_{\min}=\max\{d_x, d_y, 2\}$ for $\textsc{ReLU}$ networks to be dense in $L^p([0,1]^{d_x},\mathbb{R}^{d_y}\!)$.

Figures (4)

Figure 1: Illustration of our encoder and decoder when $d_x=2$, $d_y=1$ and $k=4$. Our encoder $g$ first maps each element of $\{\mathcal{T}_1,\dots,\mathcal{T}_4\}$ to distinct scalar codewords $u_1,\dots,u_4$. Then, the decoder $h$ maps each codeword $u_i$ to $h(u_i)\approx f^*(z_i)$ for some $z_i \in \mathcal{T}_i$.
Figure 2: Construction of $f$ in \ref{['lem:tool1']}. (a) $f$ preserves points in the half-space $\mathcal{H}^+$ represented by the gray area and projects points outside of $\mathcal{H}^+$ to the boundary of $\mathcal{H}^+$. (b) Illustrations of mapping $\mathcal{T}$ to a single point disjoint to $\mathcal{P} \cap \mathcal{H}_1^+$ when $d_x=2$: $f_1$ maps $\mathcal{T}$ onto the boundary of $\mathcal{H}_1^+$ and then $f_2$ maps $f_1(\mathcal{T})$ to the point $f_2(f_1(\mathcal{T}))$ while preserving points in $\mathcal{P} \cap \mathcal{H}_1^+$.
Figure 3: Illustration of the encoder when $d_x=2$ and $k=4$. For each partition $\mathcal{S}_i$, the encoder maps $\mathcal{T}_i\subset\mathcal{S}_i$ to some point ${u_i}\notin\{u_1,\dots,u_{i-1}\}\cup\mathcal{S}_{i+1}\cup\cdots\cup\mathcal{S}_k$. After that, the encoder maps $u_1,\dots,u_k$ to some distinct scalar values by projecting them.
Figure 4: $\mathcal{D}_1,\mathcal{D}_2$ and their corresponding images of $f^*$ when $d_x=2$ and $d_y=3$ are illustrated by the grey squares and red lines in (a) and (b). One of the possible images of $f(\mathcal{D}_1)$ and $f(\mathcal{D}_2)$ are represented by the grey surface and red curve in (c).

Theorems & Definitions (39)

Theorem 1
Theorem 2
Theorem 3
Lemma 4
Lemma 5
Lemma 6
Lemma 7
Lemma 8
Lemma 9: Brouwer's fixed-point theorem brouwer
Lemma 10
...and 29 more

Minimum width for universal approximation using ReLU networks on compact domain

TL;DR

Abstract

Minimum width for universal approximation using ReLU networks on compact domain

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (39)