Table of Contents
Fetching ...

On Expressivity of Height in Neural Networks

Feng-Lei Fan, Ze-Yu Li, Huan Xiong, Tieyong Zeng

TL;DR

This work introduces height as an intra-layer hierarchical dimension to neural networks, yielding 3D networks with width $W$, depth $K$, and height $H$. It provides bound-based and constructive evidence that height dramatically increases expressivity, achieving exponentially more linear pieces than 2D counterparts and improving polynomial-approximation rates to $\mathcal{O}(2^{-2WK})$ without extra parameters. The authors develop a comprehensive theoretical framework (including upper-bounds and tightness results) and demonstrate practical gains through extensive experiments across synthetic, tabular, and image datasets, showing that 3D networks can outperform traditional 2D networks with similar resources. The findings suggest that intra-layer height is a powerful, computationally economical augmentation that broadens the design space for foundational architectures and potentially enhances performance on a wide range of tasks.

Abstract

In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $H\times W$ and depth $K$, \textit{i.e.}, $\mathcal{O}((2^H-1)W)^K)$ vs $\mathcal{O}((HW)^K)$, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width $\mathcal{O}(W)$ and depth $\mathcal{O}(K)$ can approximate polynomials in $[0,1]^d$ with error $\mathcal{O}\left(2^{-2WK}\right)$, which improves $\mathcal{O}\left(W^{-K}\right)$ and $\mathcal{O}\left(2^{-K}\right)$ for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.

On Expressivity of Height in Neural Networks

TL;DR

This work introduces height as an intra-layer hierarchical dimension to neural networks, yielding 3D networks with width , depth , and height . It provides bound-based and constructive evidence that height dramatically increases expressivity, achieving exponentially more linear pieces than 2D counterparts and improving polynomial-approximation rates to without extra parameters. The authors develop a comprehensive theoretical framework (including upper-bounds and tightness results) and demonstrate practical gains through extensive experiments across synthetic, tabular, and image datasets, showing that 3D networks can outperform traditional 2D networks with similar resources. The findings suggest that intra-layer height is a powerful, computationally economical augmentation that broadens the design space for foundational architectures and potentially enhances performance on a wide range of tasks.

Abstract

In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width , depth , and height has greater expressive power than a 2D network of width and depth , \textit{i.e.}, vs , in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width and depth can approximate polynomials in with error , which improves and for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.
Paper Structure (16 sections, 15 theorems, 30 equations, 15 figures, 9 tables)

This paper contains 16 sections, 15 theorems, 30 equations, 15 figures, 9 tables.

Key Result

Lemma 4

Let $g: \mathbb{R} \rightarrow \mathbb{R}$ be a PWL function with $w+1$ pieces, then the breakpoints of $f:= \sigma(g)$ consist of two parts: some old breakpoints of $g$ and at most $w+1$ newly produced breakpoints. Furthermore, $f$ has $w+1$ new breakpoints if and only if $g$ has $w+1$ distinct zer

Figures (15)

  • Figure 1: (a) A 2D network characterized by width and depth. (b) A 3D network characterized by width, depth, and height.
  • Figure 2: 2D-3D transformation via the intra-layer links.
  • Figure 3: Two types of horizontally and vertically uniform networks are used in this paper.
  • Figure 4: Differences of height and depth in accomplishing higher approximation power in terms of the mechanism of generating more pieces, the number of (affine transforms, activation), and function classes.
  • Figure 5: Construction of PWL functions to reach the bound of Proposition \ref{['tight_bound_1']} when $w_{1}=3$, $w_{2}=2$.
  • ...and 10 more figures

Theorems & Definitions (32)

  • Definition 1: Width and depth of 2D networks arora2016understanding
  • Definition 2: Width, depth, and height of 3D networks
  • Definition 3: 2D-3D Transformation, $\mathcal{N}_{W\times H, K}^2 \to \mathcal{N}_{W,K,H}^3$
  • Lemma 4
  • proof
  • Theorem 5: Upper bound of 2D networks $\mathcal{N}_{W,K}^2$
  • proof
  • Lemma 6: A corollary of Lemma \ref{['old_new']}
  • Theorem 7: Upper bound of three dimensional networks $\mathcal{N}_{W,K,H}^3$
  • proof
  • ...and 22 more