Representation Benefits of Deep Feedforward Networks

Matus Telgarsky

Representation Benefits of Deep Feedforward Networks

Matus Telgarsky

TL;DR

This work studies depth versus width in neural networks with ReLU activations by constructing a family of classification tasks parameterized by $k$, where $n=2^k$ samples on $[0,1]$ with alternating labels yield an exponential separation: shallow networks with $m \le 2^{(k-3)/l - 1}$ nodes per layer cannot achieve zero training error, while a deep network with $2$ nodes per layer across $2k$ layers (or a $3$-node recurrent network iterated $k$ times) can achieve zero error. The key techniques rely on sawtooth function counts and a mirror-map $f_m$ to realize an exact fit, plus a refined $n$-alternating-point problem that sharpens the bounds. The results formalize an exponential-depth advantage in expressive power for finite data, connecting to classical circuit complexity and VC-dimension insights, and highlighting how depth can dramatically reduce necessary resources for exact representations.

Abstract

This note provides a family of classification problems, indexed by a positive integer $k$, where all shallow networks with fewer than exponentially (in $k$) many nodes exhibit error at least $1/6$, whereas a deep network with 2 nodes in each of $2k$ layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated $k$ times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

Representation Benefits of Deep Feedforward Networks

TL;DR

This work studies depth versus width in neural networks with ReLU activations by constructing a family of classification tasks parameterized by

, where

samples on

with alternating labels yield an exponential separation: shallow networks with

nodes per layer cannot achieve zero training error, while a deep network with

nodes per layer across

layers (or a

-node recurrent network iterated

times) can achieve zero error. The key techniques rely on sawtooth function counts and a mirror-map

to realize an exact fit, plus a refined

-alternating-point problem that sharpens the bounds. The results formalize an exponential-depth advantage in expressive power for finite data, connecting to classical circuit complexity and VC-dimension insights, and highlighting how depth can dramatically reduce necessary resources for exact representations.

Abstract

This note provides a family of classification problems, indexed by a positive integer

, where all shallow networks with fewer than exponentially (in

) many nodes exhibit error at least

, whereas a deep network with 2 nodes in each of

layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated

times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

Representation Benefits of Deep Feedforward Networks

TL;DR

Abstract

Representation Benefits of Deep Feedforward Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (10)