Table of Contents
Fetching ...

Norm-Based Capacity Control in Neural Networks

Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

TL;DR

This work develops a unified framework for norm-based capacity control in feed-forward networks with RELU activations. It introduces a group-norm regularizer $\\mu_{p,q}$ and its path-based counterpart $\\phi_p$, analyzes when these controls yield size- (width) independent generalization bounds, and investigates convexity of the induced hypothesis classes. The paper shows a precise dichotomy: width-independent generalization is achievable for certain parameter regimes (e.g., per-unit $\ell_1$ or overall $\ell_p$ with $p\le 2$ and modest depth), but for deeper networks with $p>1$ or $p>2$ (respectively) the capacity grows with depth unless width is also constrained, often exponentially. It further connects per-unit regularization to convex nets in two-layer settings, proves semi-norm properties for the group-path measures, and establishes hardness results that persist despite convexity in certain regimes. These results illuminate the fundamental limits of norm-based regularization for deep networks and underscore a trade-off between depth, regularization strength, and computational tractability, with implications for designing scalable training methods and understanding why deep networks can be difficult to optimize.

Abstract

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

Norm-Based Capacity Control in Neural Networks

TL;DR

This work develops a unified framework for norm-based capacity control in feed-forward networks with RELU activations. It introduces a group-norm regularizer and its path-based counterpart , analyzes when these controls yield size- (width) independent generalization bounds, and investigates convexity of the induced hypothesis classes. The paper shows a precise dichotomy: width-independent generalization is achievable for certain parameter regimes (e.g., per-unit or overall with and modest depth), but for deeper networks with or (respectively) the capacity grows with depth unless width is also constrained, often exponentially. It further connects per-unit regularization to convex nets in two-layer settings, proves semi-norm properties for the group-path measures, and establishes hardness results that persist despite convexity in certain regimes. These results illuminate the fundamental limits of norm-based regularization for deep networks and underscore a trade-off between depth, regularization strength, and computational tractability, with implications for designing scalable training methods and understanding why deep networks can be difficult to optimize.

Abstract

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

Paper Structure

This paper contains 33 sections, 22 theorems, 52 equations.

Key Result

Theorem 1

For any $d,q\geq 1$, any $1\leq p <\infty$ and any set $S=\{x_1,\dots,x_m\}\subseteq\mathbb{R}^D$: and so: where the second inequalities hold only if $1\leq p \leq 2$, $\mathcal{R}^{\text{linear}}_{m,p,D}$ is the Rademacher complexity of $D$-dimensional linear predictors with unit $\ell_p$ norm with respect to a set of $m$ samples and $p^*$ is such that $\frac{1}{p^*} + \frac{1}{p}=1$.

Theorems & Definitions (23)

  • Claim 1
  • Theorem 1
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Corollary 7
  • Corollary 8
  • Theorem 9
  • ...and 13 more