Table of Contents
Fetching ...

On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth

Gennadiy Averkov, Christopher Hojny, Maximilian Merkert

TL;DR

The paper addresses the exact expressiveness of ReLU networks for the CPWL function $F_n = \max\{0, x_1, \dots, x_n\}$ and provides non-constant lower bounds on the depth required when weights are rational, specifically $N$-ary fractions. Building on a polyhedral characterization of ReLU networks and a SU-closure framework, it leverages lattice polytopes and normalized volumes modulo a prime to obstruct shallow representations. The main results show that any network with $N$-ary fractional weights must have at least $\lceil\log_p(n+1)\rceil$ hidden layers for a prime $p \nmid N$, with the decimal case ($N=10$) giving $\lceil\log_3(n+1)\rceil$, and a general lower bound $\Omega(\ln n/\ln \ln N)$; this offers a partial confirmation of the depth-separation conjecture for rational weights and provides a practical depth bound for finite-precision networks. The work introduces a robust algebraic-invariant technique linking neural network expressiveness to polyhedral geometry and number theory, with potential implications for understanding the role of depth and max-pooling in practical architectures.

Abstract

To confirm that the expressive power of ReLU neural networks grows with their depth, the function $F_n = \max \{0,x_1,\ldots,x_n\}$ has been considered in the literature. A conjecture by Hertrich, Basu, Di Summa, and Skutella [NeurIPS 2021] states that any ReLU network that exactly represents $F_n$ has at least $\lceil\log_2 (n+1)\rceil$ hidden layers. The conjecture has recently been confirmed for networks with integer weights by Haase, Hertrich, and Loho [ICLR 2023]. We follow up on this line of research and show that, within ReLU networks whose weights are decimal fractions, $F_n$ can only be represented by networks with at least $\lceil\log_3 (n+1)\rceil$ hidden layers. Moreover, if all weights are $N$-ary fractions, then $F_n$ can only be represented by networks with at least $Ω( \frac{\ln n}{\ln \ln N})$ layers. These results are a partial confirmation of the above conjecture for rational ReLU networks, and provide the first non-constant lower bound on the depth of practically relevant ReLU networks.

On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth

TL;DR

The paper addresses the exact expressiveness of ReLU networks for the CPWL function and provides non-constant lower bounds on the depth required when weights are rational, specifically -ary fractions. Building on a polyhedral characterization of ReLU networks and a SU-closure framework, it leverages lattice polytopes and normalized volumes modulo a prime to obstruct shallow representations. The main results show that any network with -ary fractional weights must have at least hidden layers for a prime , with the decimal case () giving , and a general lower bound ; this offers a partial confirmation of the depth-separation conjecture for rational weights and provides a practical depth bound for finite-precision networks. The work introduces a robust algebraic-invariant technique linking neural network expressiveness to polyhedral geometry and number theory, with potential implications for understanding the role of depth and max-pooling in practical architectures.

Abstract

To confirm that the expressive power of ReLU neural networks grows with their depth, the function has been considered in the literature. A conjecture by Hertrich, Basu, Di Summa, and Skutella [NeurIPS 2021] states that any ReLU network that exactly represents has at least hidden layers. The conjecture has recently been confirmed for networks with integer weights by Haase, Hertrich, and Loho [ICLR 2023]. We follow up on this line of research and show that, within ReLU networks whose weights are decimal fractions, can only be represented by networks with at least hidden layers. Moreover, if all weights are -ary fractions, then can only be represented by networks with at least layers. These results are a partial confirmation of the above conjecture for rational ReLU networks, and provide the first non-constant lower bound on the depth of practically relevant ReLU networks.

Paper Structure

This paper contains 14 sections, 18 theorems, 19 equations, 2 figures.

Key Result

Theorem 2

Let $n$ and $N$ be positive integers, and let $p$ be a prime number that does not divide $N$. Every ReLU network with weights being $N$-ary fractions requires at least $\lceil\log_p (n + 1)\rceil$ hidden layers to exactly represent the function $\max\{0,x_1,\dots,x_n\}$.

Figures (2)

  • Figure 1: Illustration of the convex hull of a polytope and a point, relating to Proposition \ref{['prop:vol:join']}.
  • Figure 2: Illustration of the Minkowski sum of two polytopes, relating to Example \ref{['ex:polytope:operations']}.

Theorems & Definitions (32)

  • Conjecture 1
  • Theorem 2
  • Corollary 3
  • Theorem 4
  • Theorem 5: see, e.g., schneider
  • Remark 6
  • Proposition 7: hhl
  • Theorem 8: hertrich:phd for $R=\mathbb{R}$ and hhl for $R=\mathbb{Z}$
  • Corollary 9
  • proof
  • ...and 22 more