On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Gennadiy Averkov, Christopher Hojny, Maximilian Merkert
TL;DR
The paper addresses the exact expressiveness of ReLU networks for the CPWL function $F_n = \max\{0, x_1, \dots, x_n\}$ and provides non-constant lower bounds on the depth required when weights are rational, specifically $N$-ary fractions. Building on a polyhedral characterization of ReLU networks and a SU-closure framework, it leverages lattice polytopes and normalized volumes modulo a prime to obstruct shallow representations. The main results show that any network with $N$-ary fractional weights must have at least $\lceil\log_p(n+1)\rceil$ hidden layers for a prime $p \nmid N$, with the decimal case ($N=10$) giving $\lceil\log_3(n+1)\rceil$, and a general lower bound $\Omega(\ln n/\ln \ln N)$; this offers a partial confirmation of the depth-separation conjecture for rational weights and provides a practical depth bound for finite-precision networks. The work introduces a robust algebraic-invariant technique linking neural network expressiveness to polyhedral geometry and number theory, with potential implications for understanding the role of depth and max-pooling in practical architectures.
Abstract
To confirm that the expressive power of ReLU neural networks grows with their depth, the function $F_n = \max \{0,x_1,\ldots,x_n\}$ has been considered in the literature. A conjecture by Hertrich, Basu, Di Summa, and Skutella [NeurIPS 2021] states that any ReLU network that exactly represents $F_n$ has at least $\lceil\log_2 (n+1)\rceil$ hidden layers. The conjecture has recently been confirmed for networks with integer weights by Haase, Hertrich, and Loho [ICLR 2023]. We follow up on this line of research and show that, within ReLU networks whose weights are decimal fractions, $F_n$ can only be represented by networks with at least $\lceil\log_3 (n+1)\rceil$ hidden layers. Moreover, if all weights are $N$-ary fractions, then $F_n$ can only be represented by networks with at least $Ω( \frac{\ln n}{\ln \ln N})$ layers. These results are a partial confirmation of the above conjecture for rational ReLU networks, and provide the first non-constant lower bound on the depth of practically relevant ReLU networks.
