An in-depth look at approximation via deep and narrow neural networks
Joris Dommel, Sven A. Wegner
TL;DR
The paper revisits the density threshold for deep ReLU networks near the Hanin–Sellke boundary, analyzing a concrete counterexample function $f$ to understand how depth interacts with width. It provides a streamlined proof that density requires $w>n$ and reports extensive experiments in non-dense ($w=n$) and dense ($w=n+1$) regimes, comparing $\|f-N\|_{K,\infty}$ and MSE losses. The findings indicate that the best $\|f-N\|_{K,\infty}$ is $1/8$ attained by the constant $N_0\equiv 1/8$, with depth effects depending on dimension and plagued by dying ReLUs at larger depths; in the dense regime, depth improves approximations up to a threshold before collapse. These results guide architecture choices near the width threshold and emphasize the difference between sup-norm performance and training-loss behavior under concentration phenomena.
Abstract
In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.
