Table of Contents
Fetching ...

An in-depth look at approximation via deep and narrow neural networks

Joris Dommel, Sven A. Wegner

TL;DR

The paper revisits the density threshold for deep ReLU networks near the Hanin–Sellke boundary, analyzing a concrete counterexample function $f$ to understand how depth interacts with width. It provides a streamlined proof that density requires $w>n$ and reports extensive experiments in non-dense ($w=n$) and dense ($w=n+1$) regimes, comparing $\|f-N\|_{K,\infty}$ and MSE losses. The findings indicate that the best $\|f-N\|_{K,\infty}$ is $1/8$ attained by the constant $N_0\equiv 1/8$, with depth effects depending on dimension and plagued by dying ReLUs at larger depths; in the dense regime, depth improves approximations up to a threshold before collapse. These results guide architecture choices near the width threshold and emphasize the difference between sup-norm performance and training-loss behavior under concentration phenomena.

Abstract

In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.

An in-depth look at approximation via deep and narrow neural networks

TL;DR

The paper revisits the density threshold for deep ReLU networks near the Hanin–Sellke boundary, analyzing a concrete counterexample function to understand how depth interacts with width. It provides a streamlined proof that density requires and reports extensive experiments in non-dense () and dense () regimes, comparing and MSE losses. The findings indicate that the best is attained by the constant , with depth effects depending on dimension and plagued by dying ReLUs at larger depths; in the dense regime, depth improves approximations up to a threshold before collapse. These results guide architecture choices near the width threshold and emphasize the difference between sup-norm performance and training-loss behavior under concentration phenomena.

Abstract

In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.

Paper Structure

This paper contains 5 sections, 1 theorem, 12 equations.

Key Result

Theorem 1

HS17 For $n\in\mathbb{N}$ the space $\mathcal{D}^{\operatorname{ReLU},n}(\mathbb{R}^n)$ of neural networks is not dense in $\mathcal{C}(\mathbb{R}^n)$; indeed it even holds:

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • Remark 2