Table of Contents
Fetching ...

Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin, Mark Sellke

TL;DR

This work identifies a sharp width threshold for the expressive power of deep ReLU networks: to uniformly approximate any continuous function on $[0,1]^{d_{in}}$, hidden widths must be at least $d_{in}+1$, while widths up to $d_{in}+d_{out}$ suffice for universal approximation with depth depending on the modulus of continuity. The authors introduce max-min strings as a constructive framework, proving an upper bound via a detailed ReLU realization of these strings and a density result showing how continuous functions can be approximated by such strings with width $d_{in}+d_{out}$. A complementary lower bound is established using topological arguments on level sets, demonstrating that width below $d_{in}+1$ is insufficient. The results also provide quantitative depth bounds in terms of the modulus of continuity, clarifying the trade-off between width and depth in the expressive power of deep ReLU nets.

Abstract

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{d_{in}}\to\mathbb R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

Approximating Continuous Functions by ReLU Nets of Minimal Width

TL;DR

This work identifies a sharp width threshold for the expressive power of deep ReLU networks: to uniformly approximate any continuous function on , hidden widths must be at least , while widths up to suffice for universal approximation with depth depending on the modulus of continuity. The authors introduce max-min strings as a constructive framework, proving an upper bound via a detailed ReLU realization of these strings and a density result showing how continuous functions can be approximated by such strings with width . A complementary lower bound is established using topological arguments on level sets, demonstrating that width below is insufficient. The results also provide quantitative depth bounds in terms of the modulus of continuity, clarifying the trade-off between width and depth in the expressive power of deep ReLU nets.

Abstract

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed what is the minimal width so that neural nets with ReLU activations, input dimension , hidden layer widths at most and arbitrary depth can approximate any continuous, real-valued function of variables arbitrarily well? It turns out that this minimal width is exactly equal to That is, if all the hidden layer widths are bounded by , then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the -dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly Our construction in fact shows that any continuous function can be approximated by a net of width . We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of .

Paper Structure

This paper contains 5 sections, 6 theorems, 49 equations, 2 figures.

Key Result

Theorem 1

For every $d_{in},d_{out}\geq 1,$

Figures (2)

  • Figure 1: To extend an $\varepsilon$-approximation of $f$ on the inner disk of radius $r$ to the outer disk of radius $r'=r+\frac{\omega_f^{-1}(\varepsilon)^2}{r}$, we proceed in steps. Each step, we draw triangle $X'ZY'$ as shown and apply Lemma \ref{['L:extend']} to extend our approximation to a larger region. Because the outer circle $B_{r'}(P)$ is contained in sector $X'ZY'$, we do not lose any area contained in $B_{r'}(P)$ when applying Lemma \ref{['L:extend']}.
  • Figure 2: In Figure \ref{['fig:extension']}, after applying Lemma \ref{['L:extend']}, the region on which we approximated $f$ has grown to include the shaded circular sector $X_0PY_0$. (This is just because it is contained in the union of the two shaded regions in Figure \ref{['fig:extension']}.) Since $d(X,Y)\asymp \varepsilon$, this means that applying Lemma \ref{['L:extend']} to $O\left(\frac{r}{\varepsilon}\right)$ rotated configurations of this form extends the region of $\varepsilon$-approximation from $B_r(P)$ to $B_{r'}(P)$.

Theorems & Definitions (11)

  • Theorem 1
  • Definition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • proof
  • proof
  • Lemma 5
  • proof
  • Lemma 6
  • ...and 1 more