Table of Contents
Fetching ...

Error bounds for approximations with deep ReLU networks

Dmitry Yarotsky

TL;DR

This paper analyzes the expressive power of deep versus shallow ReLU networks in approximating functions from Sobolev spaces, focusing on $L^ty$-error and complexity measured by weights and units. It establishes sharp upper bounds showing deep networks can approximate smooth functions with depth $O((1/))$ and size $O(^{-d/n}(+1))$, and demonstrates adaptive architectures can further reduce complexity in one-dimensional Lipschitz cases to $O(1/(\u0015))$ up to logarithmic factors. Parallel lower-bound results via continuous nonlinear widths, VC-dimension, and adaptive-architecture arguments reveal fundamental limits and gaps between fixed-architecture and function-dependent designs. The work highlights depth efficiency for smooth targets, contrasts ReLU with smooth activations, and discusses implications for model selection, architecture adaptation, and future theory-aligned network design.

Abstract

We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.

Error bounds for approximations with deep ReLU networks

TL;DR

This paper analyzes the expressive power of deep versus shallow ReLU networks in approximating functions from Sobolev spaces, focusing on -error and complexity measured by weights and units. It establishes sharp upper bounds showing deep networks can approximate smooth functions with depth and size , and demonstrates adaptive architectures can further reduce complexity in one-dimensional Lipschitz cases to up to logarithmic factors. Parallel lower-bound results via continuous nonlinear widths, VC-dimension, and adaptive-architecture arguments reveal fundamental limits and gaps between fixed-architecture and function-dependent designs. The work highlights depth efficiency for smooth targets, contrasts ReLU with smooth activations, and discusses implications for model selection, architecture adaptation, and future theory-aligned network design.

Abstract

We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.

Paper Structure

This paper contains 17 sections, 13 theorems, 92 equations, 6 figures.

Key Result

Proposition 1

Let $\rho:\mathbb R\to\mathbb R$ be any continuous piece-wise linear function with $M$ breakpoints, where $1\le M<\infty$.

Figures (6)

  • Figure 1: A feedforward neural network having 3 input units (diamonds), 1 output unit (square), and 7 computation units with nonlinear activation (circles). The network has 4 layers and $16+8=24$ weights.
  • Figure 2: Fast approximation of the function $f(x)=x^2$ from Proposition \ref{['th:x2']}: (a) the "tooth" function $g$ and the iterated "sawtooth" functions $g_2, g_3$; (b) the approximating functions $f_m$; (c) the network architecture for $f_4$.
  • Figure 3: Functions $(\phi_m)_{m=0}^5$ forming a partition of unity for $d=1, N=5$ in the proof of Theorem \ref{['th:gensmooth']}.
  • Figure 4: Architecture of the network implementing the function $\widetilde{f}=\widetilde{f}_1+\widetilde{f}_2$ from Lemma \ref{['th:rho']}.
  • Figure 5: A function $f$ considered in the proof of Theorem 2 (for $d=2$).
  • ...and 1 more figures

Theorems & Definitions (25)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 15 more