Dense Neural Networks are not Universal Approximators
Levi Rauchwerger, Stefanie Jegelka, Ron Levie
TL;DR
This work addresses whether densely connected neural networks can universally approximate all Lipschitz functions when depth is fixed and weights are bounded. By reinterpreting dense networks as graph neural networks through computational kernels and applying a weak regularity lemma, the authors prove a deterministic compression bound: any dense ReLU network can be approximated by a bounded-size network in the computational cut distance, implying a fundamental expressivity ceiling. They connect this compression to VC-dimension-based lower bounds, showing that for sufficiently large input dimension, dense networks cannot achieve universal approximation of Lip$(d_0,d_L)$, thus highlighting a saturation phenomenon. The results motivate sparse connectivity as a necessary ingredient for true universality and provide experiments on MNIST that illustrate the predicted saturation. Overall, the paper offers a principled compression-based lens to understand expressivity limits in dense architectures and suggests directions for analyzing more structured networks.”
Abstract
We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that dense neural networks do not possess this universality. Our argument is based on a model compression approach, combining the weak regularity lemma with an interpretation of feedforward networks as message passing graph neural networks. We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions, which model a notion of dense connectivity. Within this setting, we demonstrate the existence of Lipschitz continuous functions that cannot be approximated by such networks. This highlights intrinsic limitations of neural networks with dense layers and motivates the use of sparse connectivity as a necessary ingredient for achieving true universality.
