Table of Contents
Fetching ...

Towards Lower Bounds on the Depth of ReLU Neural Networks

Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella

TL;DR

This work tackles the fundamental question of depth versus expressivity in ReLU networks by combining mixed-integer optimization, polyhedral geometry, and tropical geometry to study which piecewise linear functions are exactly representable as depth grows. It provides a conditional depth lower-bound via a MIP argument, establishes a strict depth hierarchy for k ≥ 2 by showing ReLU(k) strictly contains MAX(2^k), and develops polynomial-width bounds for representing CPWL functions through Newton polyhedra and convex–concave decompositions. A key contribution is linking neural-function representability to Newton polytopes and polyhedral complexes, enabling a geometric lens on depth and width that complements universal approximation results. Overall, the paper advances the theoretical understanding of depth-based expressivity limits and opens avenues for geometric and combinatorial proofs of depth lower bounds beyond small-scale MIP evidence.

Abstract

We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.

Towards Lower Bounds on the Depth of ReLU Neural Networks

TL;DR

This work tackles the fundamental question of depth versus expressivity in ReLU networks by combining mixed-integer optimization, polyhedral geometry, and tropical geometry to study which piecewise linear functions are exactly representable as depth grows. It provides a conditional depth lower-bound via a MIP argument, establishes a strict depth hierarchy for k ≥ 2 by showing ReLU(k) strictly contains MAX(2^k), and develops polynomial-width bounds for representing CPWL functions through Newton polyhedra and convex–concave decompositions. A key contribution is linking neural-function representability to Newton polytopes and polyhedral complexes, enabling a geometric lens on depth and width that complements universal approximation results. Overall, the paper advances the theoretical understanding of depth-based expressivity limits and opens avenues for geometric and combinatorial proofs of depth lower bounds beyond small-scale MIP evidence.

Abstract

We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.

Paper Structure

This paper contains 20 sections, 30 theorems, 54 equations, 5 figures.

Key Result

Theorem 1.1

If $n\in{\mathbb N}$ and $k^*\coloneqq\lceil\log_2 (n + 1)\rceil$, then $\mathop{\mathrm{CPWL}}\nolimits_n = \mathop{\mathrm{ReLU}}\nolimits_n(k^*)$.

Figures (5)

  • Figure 1: An NN with two input neurons, labeled $x_1$ and $x_2$, three hidden neurons, labeled with the shape of the rectifier function, and one output neuron, labeled $y$. The arcs are labeled with their weights and all biases are zero. The NN has depth 2, width 3, and size 3. It computes the function $x\mapsto y= \max\{0,x_1-x_2\}+\max\{0,x_2\}-\max\{0,-x_2\}= \max\{0,x_1-x_2\}+x_2=\max\{x_1,x_2\}$.
  • Figure 2: An NN to compute the maximum of four numbers that consists of three copies of the NN in \ref{['Fig:Max2Num']}. Note that no activiation function is applied at the two unlabeled middle vertices (representing $\max\{x_1,x_2\}$ and $\max\{x_3,x_4\}$). Therefore, the linear transformations directly before and after these vertices can be combined into a single one. Thus, the network has total depth three (two hidden layers).
  • Figure 3: Set of breakpoints of the function $\max\{0,x_1,x_2\}$ (left). This function cannot be computed by a 2-layer NN (middle), since the set of breakpoints of any function computed by such an NN is always a union of lines (right).
  • Figure 4: A function is $H$-conforming if the set of breakpoints is a subset of the hyperplane arrangement $H$. The arrangement $H$ consists of all hyperplanes where two of the coordinates (possibly including $x_0=0$) are equal. Here, $H$ is illustrated for the (simpler) two-dimensional case, where it consists of three hyperplanes that divide the space into six cells.
  • Figure 5: Set of polytopes that can arise as Newton polytopes of convex CPWL functions computed by (parts of) a 2-hidden-layer NN.

Theorems & Definitions (65)

  • Theorem 1.1: Arora et al. Arora:DNNwithReLU
  • Lemma 1.2: Arora et al. Arora:DNNwithReLU
  • Theorem 1.3: Wang and Sun wang2005generalization
  • Conjecture 1.4
  • Conjecture 1.5
  • Proposition 1.6
  • proof
  • Theorem 1.7
  • Theorem 1.8
  • Theorem 1.9
  • ...and 55 more