Norm-Based Capacity Control in Neural Networks

Behnam Neyshabur; Ryota Tomioka; Nathan Srebro

Norm-Based Capacity Control in Neural Networks

Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

TL;DR

This work develops a unified framework for norm-based capacity control in feed-forward networks with RELU activations. It introduces a group-norm regularizer $\\mu_{p,q}$ and its path-based counterpart $\\phi_p$, analyzes when these controls yield size- (width) independent generalization bounds, and investigates convexity of the induced hypothesis classes. The paper shows a precise dichotomy: width-independent generalization is achievable for certain parameter regimes (e.g., per-unit $\ell_1$ or overall $\ell_p$ with $p\le 2$ and modest depth), but for deeper networks with $p>1$ or $p>2$ (respectively) the capacity grows with depth unless width is also constrained, often exponentially. It further connects per-unit regularization to convex nets in two-layer settings, proves semi-norm properties for the group-path measures, and establishes hardness results that persist despite convexity in certain regimes. These results illuminate the fundamental limits of norm-based regularization for deep networks and underscore a trade-off between depth, regularization strength, and computational tractability, with implications for designing scalable training methods and understanding why deep networks can be difficult to optimize.

Abstract

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

Norm-Based Capacity Control in Neural Networks

TL;DR

This work develops a unified framework for norm-based capacity control in feed-forward networks with RELU activations. It introduces a group-norm regularizer

and its path-based counterpart

, analyzes when these controls yield size- (width) independent generalization bounds, and investigates convexity of the induced hypothesis classes. The paper shows a precise dichotomy: width-independent generalization is achievable for certain parameter regimes (e.g., per-unit

or overall

with

and modest depth), but for deeper networks with

(respectively) the capacity grows with depth unless width is also constrained, often exponentially. It further connects per-unit regularization to convex nets in two-layer settings, proves semi-norm properties for the group-path measures, and establishes hardness results that persist despite convexity in certain regimes. These results illuminate the fundamental limits of norm-based regularization for deep networks and underscore a trade-off between depth, regularization strength, and computational tractability, with implications for designing scalable training methods and understanding why deep networks can be difficult to optimize.

Abstract

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

Norm-Based Capacity Control in Neural Networks

TL;DR

Abstract

Norm-Based Capacity Control in Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (23)