On the Sample Complexity of One Hidden Layer Networks with Equivariance, Locality and Weight Sharing
Arash Behboodi, Gabriele Cesa
TL;DR
This paper analyzes how equivariance, locality, and weight sharing influence the sample complexity of one-hidden-layer networks through Rademacher complexity-based bounds. It derives dimension-free bounds for group-convolution and equivariant architectures, extends to max-pooling and multi-layer networks with mild dimension dependence, and provides a matching lower bound for the Rademacher complexity. The authors also connect the analysis to general equivariant networks on compact groups, weight-sharing schemes, and locally constrained filters, highlighting a trade-off between locality and expressivity via an uncertainty-principle argument. Empirical results on rotated MNIST and CIFAR-10 validate the theoretical bound's relevance and reveal consistent trends with respect to group size, pooling, and frequency-domain locality. Overall, the work clarifies when and how architectural biases like symmetry, locality, and weight sharing can improve generalization in neural networks, offering dimension-free insights and practical guidance for design choices in symmetry-aware models.
Abstract
Weight sharing, equivariance, and local filters, as in convolutional neural networks, are believed to contribute to the sample efficiency of neural networks. However, it is not clear how each one of these design choices contributes to the generalization error. Through the lens of statistical learning theory, we aim to provide insight into this question by characterizing the relative impact of each choice on the sample complexity. We obtain lower and upper sample complexity bounds for a class of single hidden layer networks. For a large class of activation functions, the bounds depend merely on the norm of filters and are dimension-independent. We also provide bounds for max-pooling and an extension to multi-layer networks, both with mild dimension dependence. We provide a few takeaways from the theoretical results. It can be shown that depending on the weight-sharing mechanism, the non-equivariant weight-sharing can yield a similar generalization bound as the equivariant one. We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity. We conduct extensive experiments and highlight some consistent trends for these models.
