Approximating Continuous Functions by ReLU Nets of Minimal Width
Boris Hanin, Mark Sellke
TL;DR
This work identifies a sharp width threshold for the expressive power of deep ReLU networks: to uniformly approximate any continuous function on $[0,1]^{d_{in}}$, hidden widths must be at least $d_{in}+1$, while widths up to $d_{in}+d_{out}$ suffice for universal approximation with depth depending on the modulus of continuity. The authors introduce max-min strings as a constructive framework, proving an upper bound via a detailed ReLU realization of these strings and a density result showing how continuous functions can be approximated by such strings with width $d_{in}+d_{out}$. A complementary lower bound is established using topological arguments on level sets, demonstrating that width below $d_{in}+1$ is insufficient. The results also provide quantitative depth bounds in terms of the modulus of continuity, clarifying the trade-off between width and depth in the expressive power of deep ReLU nets.
Abstract
This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{d_{in}}\to\mathbb R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.
