An in-depth look at approximation via deep and narrow neural networks

Joris Dommel; Sven A. Wegner

An in-depth look at approximation via deep and narrow neural networks

Joris Dommel, Sven A. Wegner

TL;DR

The paper revisits the density threshold for deep ReLU networks near the Hanin–Sellke boundary, analyzing a concrete counterexample function $f$ to understand how depth interacts with width. It provides a streamlined proof that density requires $w>n$ and reports extensive experiments in non-dense ($w=n$) and dense ($w=n+1$) regimes, comparing $\|f-N\|_{K,\infty}$ and MSE losses. The findings indicate that the best $\|f-N\|_{K,\infty}$ is $1/8$ attained by the constant $N_0\equiv 1/8$, with depth effects depending on dimension and plagued by dying ReLUs at larger depths; in the dense regime, depth improves approximations up to a threshold before collapse. These results guide architecture choices near the width threshold and emphasize the difference between sup-norm performance and training-loss behavior under concentration phenomena.

Abstract

In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.

An in-depth look at approximation via deep and narrow neural networks

TL;DR

The paper revisits the density threshold for deep ReLU networks near the Hanin–Sellke boundary, analyzing a concrete counterexample function

to understand how depth interacts with width. It provides a streamlined proof that density requires

and reports extensive experiments in non-dense (

) and dense (

) regimes, comparing

and MSE losses. The findings indicate that the best

attained by the constant

, with depth effects depending on dimension and plagued by dying ReLUs at larger depths; in the dense regime, depth improves approximations up to a threshold before collapse. These results guide architecture choices near the width threshold and emphasize the difference between sup-norm performance and training-loss behavior under concentration phenomena.

An in-depth look at approximation via deep and narrow neural networks

TL;DR

Abstract

An in-depth look at approximation via deep and narrow neural networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (3)