Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur, Ryota Tomioka, Nathan Srebro
TL;DR
This work develops a unified framework for norm-based capacity control in feed-forward networks with RELU activations. It introduces a group-norm regularizer $\\mu_{p,q}$ and its path-based counterpart $\\phi_p$, analyzes when these controls yield size- (width) independent generalization bounds, and investigates convexity of the induced hypothesis classes. The paper shows a precise dichotomy: width-independent generalization is achievable for certain parameter regimes (e.g., per-unit $\ell_1$ or overall $\ell_p$ with $p\le 2$ and modest depth), but for deeper networks with $p>1$ or $p>2$ (respectively) the capacity grows with depth unless width is also constrained, often exponentially. It further connects per-unit regularization to convex nets in two-layer settings, proves semi-norm properties for the group-path measures, and establishes hardness results that persist despite convexity in certain regimes. These results illuminate the fundamental limits of norm-based regularization for deep networks and underscore a trade-off between depth, regularization strength, and computational tractability, with implications for designing scalable training methods and understanding why deep networks can be difficult to optimize.
Abstract
We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.
