Architecture independent generalization bounds for overparametrized deep ReLU networks
Anandatheertha Bapu, Thomas Chen, Chun-Kai Kevin Chien, Patricia Muñoz Ewald, Andrew G. Moore
TL;DR
The paper addresses the puzzling generalization behavior of overparametrized deep ReLU networks by deriving architecture-independent generalization bounds that depend on the metric geometry of the data and on activation regularity and weight norms. It introduces the existence of explicitly constructible zero-loss minimizers in strongly overparametrized regimes and proves a uniform generalization bound that remains independent of depth or width, leveraging a Lipschitz-continuous activation and a data-driven Chamfer-distance bound. The key contributions include a priori and generalization bounds tied to data geometry, the construction of zero-loss minimizers for ReLU nets, and a detailed comparison with VC-based probabilistic bounds, complemented by MNIST experiments that support the theory with an average test-bound agreement around 22%. The work provides a data-geometry-centric explanation for generalization in the overparametrized regime and suggests practical implications for understanding and controlling generalization via data structure and weight norms.
Abstract
We prove that overparametrized neural networks are able to generalize with a test error that is independent of the level of overparametrization, and independent of the Vapnik-Chervonenkis (VC) dimension. We prove explicit bounds that only depend on the metric geometry of the test and training sets, on the regularity properties of the activation function, and on the operator norms of the weights and norms of biases. For overparametrized deep ReLU networks with a training sample size bounded by the input space dimension, we explicitly construct zero loss minimizers without use of gradient descent, and prove a uniform generalization bound that is independent of the network architecture. We perform computational experiments of our theoretical results with MNIST, and obtain agreement with the true test error within a 22 % margin on average.
