Table of Contents
Fetching ...

Better Neural Network Expressivity: Subdividing the Simplex

Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, Amir Yehudayoff

TL;DR

The paper investigates the exact depth required for ReLU networks to represent all CPWL functions on $\mathbb{R}^n$, challenging the previously believed optimum bound $\lceil \log_2(n+1)\rceil$ by proving a tighter bound $\lceil \log_3(n-2)\rceil+1$ is sufficient for $\mathsf{MAX}_n$, which in turn implies $\mathsf{CPWL}_n=\mathsf{ReLU}_{n,\lceil\log_3(n-1)\rceil+1}$. The key technical contribution includes an explicit two-hidden-layer construction for $\mathsf{MAX}_5$ and a general inductive method using the $T_{a,b}$ operator to achieve $\mathsf{MAX}_{3^n+2}$ in $n+1$ layers, together with a geometric subdivision viewpoint that links depth to Minkowski sums and Newton polytopes. The authors introduce a subdivision-based framework and prove two main claims: (i) $\mathsf{MAX}_5$ admits a two-layer realization, and (ii) $\mathsf{MAX}_{3^n+2}$ has depth $n+1$, implying the bound $\lceil \log_3(n-2)\rceil+1$ for general $n$, with weights consisting of binary fractions. This work shifts attention from lower bounds to constructive depth upper bounds via polyhedral subdivisions, opening avenues for further tightening or closing the remaining gaps (e.g., MAX_6) and exploring the trade-offs between depth and size in CPWL expressivity.

Abstract

This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.

Better Neural Network Expressivity: Subdividing the Simplex

TL;DR

The paper investigates the exact depth required for ReLU networks to represent all CPWL functions on , challenging the previously believed optimum bound by proving a tighter bound is sufficient for , which in turn implies . The key technical contribution includes an explicit two-hidden-layer construction for and a general inductive method using the operator to achieve in layers, together with a geometric subdivision viewpoint that links depth to Minkowski sums and Newton polytopes. The authors introduce a subdivision-based framework and prove two main claims: (i) admits a two-layer realization, and (ii) has depth , implying the bound for general , with weights consisting of binary fractions. This work shifts attention from lower bounds to constructive depth upper bounds via polyhedral subdivisions, opening avenues for further tightening or closing the remaining gaps (e.g., MAX_6) and exploring the trade-offs between depth and size in CPWL expressivity.

Abstract

This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on . Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on , like the maximum function, that require this depth. We disprove the conjecture and show that hidden layers are sufficient to compute all CPWL functions on . A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that hidden layers are sufficient to compute the maximum of numbers. Our constructions almost match the lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.

Paper Structure

This paper contains 12 sections, 6 theorems, 26 equations, 2 figures.

Key Result

Theorem 1

For $n\geq 1$, we have $\mathsf{MAX}_{3^n+2}\in\mathop{\mathrm{\mathsf{ReLU}}}\nolimits_{n+1}$.

Figures (2)

  • Figure 1: Two ways to look at the three-simplex. The highlighted part is one of four pyramids of subdivision --- a rhombic pyramid with base $x_1,x_{14},x_{34},x_{13}$ and apex $x_{12}$.
  • Figure 2: Subdividing the three-simplex into four identical rhombic pyramids.

Theorems & Definitions (18)

  • Theorem 1
  • Theorem 2
  • Proposition 3
  • Claim 4
  • proof
  • proof : Proof of \ref{['prop:max5']}
  • Claim 5
  • proof
  • Claim 6
  • proof
  • ...and 8 more