Table of Contents
Fetching ...

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks

Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian

TL;DR

The paper analyzes the VC-dimension and pseudodimension of deep neural networks with piecewise-linear and piecewise-polynomial activations, focusing on how depth, width, and the number of nonlinear units affect capacity. It introduces a refined bit-extraction construction to achieve a nearly tight Ω(WL log(W/L)) lower bound and provides unified upper bounds that scale as O(WL log W) for piecewise-linear activations and O(WU) in terms of nonlinear units, with a general O(WU log((d+1)p)) bound for piecewise-polynomial activations. The results illuminate depth-dependent capacity, showing near-constant dependence for piecewise-constant, linear dependence for piecewise-linear, and at most quadratic dependence for piecewise-polynomial activations, thereby clarifying the role of depth in generalization potential. These bounds unify and extend prior work, offering precise capacity characterizations across activation types and network architectures.

Abstract

We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting $W$ be the number of weights and $L$ be the number of layers, we prove that the VC-dimension is $O(W L \log(W))$, and provide examples with VC-dimension $Ω( W L \log(W/L) )$. This improves both the previously known upper bounds and lower bounds. In terms of the number $U$ of non-linear units, we prove a tight bound $Θ(W U)$ on the VC-dimension. All of these bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimensions of these function classes. Combined with previous results, this gives an intriguing range of dependencies of the VC-dimension on depth for networks with different non-linearities: there is no dependence for piecewise-constant, linear dependence for piecewise-linear, and no more than quadratic dependence for general piecewise-polynomial.

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks

TL;DR

The paper analyzes the VC-dimension and pseudodimension of deep neural networks with piecewise-linear and piecewise-polynomial activations, focusing on how depth, width, and the number of nonlinear units affect capacity. It introduces a refined bit-extraction construction to achieve a nearly tight Ω(WL log(W/L)) lower bound and provides unified upper bounds that scale as O(WL log W) for piecewise-linear activations and O(WU) in terms of nonlinear units, with a general O(WU log((d+1)p)) bound for piecewise-polynomial activations. The results illuminate depth-dependent capacity, showing near-constant dependence for piecewise-constant, linear dependence for piecewise-linear, and at most quadratic dependence for piecewise-polynomial activations, thereby clarifying the role of depth in generalization potential. These bounds unify and extend prior work, offering precise capacity characterizations across activation types and network architectures.

Abstract

We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting be the number of weights and be the number of layers, we prove that the VC-dimension is , and provide examples with VC-dimension . This improves both the previously known upper bounds and lower bounds. In terms of the number of non-linear units, we prove a tight bound on the VC-dimension. All of these bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimensions of these function classes. Combined with previous results, this gives an intriguing range of dependencies of the VC-dimension on depth for networks with different non-linearities: there is no dependence for piecewise-constant, linear dependence for piecewise-linear, and no more than quadratic dependence for general piecewise-polynomial.

Paper Structure

This paper contains 8 sections, 11 theorems, 34 equations, 2 figures.

Key Result

Theorem 3

There exists a universal constant $C$ such that the following holds. Given any $W,L$ with $W > CL > C^2$, there exists a ReLU network with $\leq L$ layers and $\leq W$ parameters with VC-dimension $\geq WL \log(W/L)/C$.

Figures (2)

  • Figure 1: The ReLU network used to extract the most significant $r$ bits of a number. Unlabeled edges indicate a weight of 1 and missing edges indicate a weight of 0.
  • Figure :

Theorems & Definitions (17)

  • Definition 1: growth function, VC-dimension, shattering
  • Definition 2: pseudodimension
  • Theorem 3: Main lower bound
  • Remark 4
  • Theorem 5
  • Theorem 6: Main upper bound
  • Remark 7
  • Theorem 8
  • Theorem 9
  • Remark 10
  • ...and 7 more