Error bounds for approximations with deep ReLU networks
Dmitry Yarotsky
TL;DR
This paper analyzes the expressive power of deep versus shallow ReLU networks in approximating functions from Sobolev spaces, focusing on $L^ty$-error and complexity measured by weights and units. It establishes sharp upper bounds showing deep networks can approximate smooth functions with depth $O((1/))$ and size $O(^{-d/n}(+1))$, and demonstrates adaptive architectures can further reduce complexity in one-dimensional Lipschitz cases to $O(1/(\u0015))$ up to logarithmic factors. Parallel lower-bound results via continuous nonlinear widths, VC-dimension, and adaptive-architecture arguments reveal fundamental limits and gaps between fixed-architecture and function-dependent designs. The work highlights depth efficiency for smooth targets, contrasts ReLU with smooth activations, and discusses implications for model selection, architecture adaptation, and future theory-aligned network design.
Abstract
We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.
