Minimum width for universal approximation using ReLU networks on compact domain
Namjun Kim, Chanho Min, Sejun Park
TL;DR
The paper resolves the exact minimum width needed for universal approximation by ReLU networks on a compact domain, showing $w_{ m min}=\max\{d_x,d_y,2\}$ for $L^p([0,1]^{d_x},\mathbb{R}^{d_y})$ and extending this to ReLU-like activations; it also proves a lower bound $w_{ m min}\ge d_y+1$ for uniform approximation when $d_x<d_y\le2d_x$, revealing a dichotomy between $L^p$ and uniform regimes. The key technique is a coding-based upper bound via an encoder–decoder construction that achieves the tight width $\max\{d_x,d_y,2\}$, contrasted with a topological argument for the uniform-case lower bound. The results clarify how compact vs unbounded domains affect the width needed for universal approximation and suggest extensions to RNNs and BRNNs. Overall, the work sharpens our understanding of the expressive power of shallow vs deep width-bounded networks and their activation-function dependencies.
Abstract
It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for $L^p$ approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.
