Table of Contents
Fetching ...

Depth-Bounds for Neural Networks via the Braid Arrangement

Moritz Grillo, Christoph Hertrich, Georg Loho

TL;DR

The paper investigates exact CPWL representation by ReLU networks under braid-fan constraints, aiming to understand depth requirements beyond known upper bounds. It develops a set-function perspective via the braid arrangement and proves a non-constant lower bound of $\Omega(\log\log d)$ hidden layers needed to represent the maximum of $d$ numbers, for braid-conforming networks. Additionally, it provides a combinatorial proof that 3 layers are necessary to compute $\max\{0,x_1,x_2,x_3,x_4\}$ and shows that maxout networks can be more expressive than naive bounds, e.g., a rank-3 followed by rank-2 maxout layer can realize the maximum of $7$ numbers. The results illuminate depth advantages under geometric constraints, point to limitations of existing upper bounds, and establish a framework for extending the approach to other polyhedral fans and CPWL representations.

Abstract

We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of $Ω(\log\log d)$ hidden layers required to exactly represent the maximum of $d$ numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.

Depth-Bounds for Neural Networks via the Braid Arrangement

TL;DR

The paper investigates exact CPWL representation by ReLU networks under braid-fan constraints, aiming to understand depth requirements beyond known upper bounds. It develops a set-function perspective via the braid arrangement and proves a non-constant lower bound of hidden layers needed to represent the maximum of numbers, for braid-conforming networks. Additionally, it provides a combinatorial proof that 3 layers are necessary to compute and shows that maxout networks can be more expressive than naive bounds, e.g., a rank-3 followed by rank-2 maxout layer can realize the maximum of numbers. The results illuminate depth advantages under geometric constraints, point to limitations of existing upper bounds, and establish a framework for extending the approach to other polyhedral fans and CPWL representations.

Abstract

We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on . While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of hidden layers required to exactly represent the maximum of numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.

Paper Structure

This paper contains 16 sections, 23 theorems, 30 equations, 4 figures.

Key Result

Proposition 2.2

The linear map $\Phi \colon \mathcal{V}_{\mathcal{B}_{d}} \to \mathcal{F}_{d}$ given by $F(S)\coloneqq \Phi(f)(S) = f(\mathds{1}_S)$ is an isomorphism.

Figures (4)

  • Figure 1:
  • Figure 2: Illustration of \ref{['lem:lattice_structure']}. The solid line in \ref{['fig:solid_line']}, decomposes the lattice in $[\emptyset, abc] \cup [d,abcd]$, which implies that $\alpha_{\emptyset,abcd}= \alpha_{\emptyset,abc} - \alpha_{d,abcd}$. The dashed line further decomposes $[\emptyset,abc] = [\emptyset,bc] \cup [a,abc]$. The 3 figures illustrate that $\alpha_{S,S\cup \{b,c\}}-\alpha_{S',S'\cup \{b,c\}}\in \mathbb{R}^\mathcal{L}(2)$ for all $S,S' \subseteq \{a,d\}$.
  • Figure 3: An illustration of the induction step. Let $Y= \{a,b,c,d,e\},X = \emptyset, \mathcal{L} =[X,Y]$ and $F \in \mathcal{F}_{\mathcal{L}}(2) \cap \mathcal{C}_{\mathcal{L}}$. If $F(a) < 0$ and $F(b) > 0$, then it follows that $F(R)$ for all $R \in [S,S \cup T]$ for $S= ab$ and $T= cde$ (\ref{['fig:low_subsets']}). In particular, $F \in \mathcal{F}_{S,S \cup T}(1)$ and thus, by \ref{['lem:lattice_structure']}, it holds that $F \in \mathcal{F}_{S',S' \cup T}(1)$ for all $S' \subseteq Y \setminus T$. \ref{['subfig:decomposition']} shows the decomposition of the lattice $\mathcal{L}=[X,Y]$ for $T= \{c,d,e\}$ into the sublattices $[S,S\cup T]$ for all $S \subseteq Y \setminus T$. For every such sublattice we have that $F \in \mathcal{F}_{[S,S\cup T]}(1) \cap \mathcal{C}_{[S,S\cup T]}$ and thus by induction $\langle {\alpha_{S,S\cup T}},{F^+} \rangle=0$.
  • Figure 4: An illustration of \ref{['lem:pairing']} (left) and \ref{['lem:case1-4']} (right). If $\operatorname{supp}(F)\subseteq \mathcal{L}_2 \cup \mathcal{L}_3$, then we can match every $S \in \mathcal{L}_2$ with a $T \in \mathcal{L}_3$ such that $F(T)=F(S)$ which implies $\langle {\alpha_{\emptyset,abcde}},{F^+} \rangle= \sum_{S \in \mathcal{L}_2}F^+(S) - \sum_{T \in \mathcal{L}_3} F^+(T)=0$. If $F(a) <0$ and $F(bcde) > 0$, then it holds that $\langle {\alpha_{\emptyset,abcde}},{F} \rangle= \langle {\alpha_{\emptyset,bcde}},{F} \rangle=0$.

Theorems & Definitions (47)

  • Definition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Lemma 3.1
  • Proposition 3.2
  • Lemma 3.3
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • ...and 37 more