On Expressivity of Height in Neural Networks
Feng-Lei Fan, Ze-Yu Li, Huan Xiong, Tieyong Zeng
TL;DR
This work introduces height as an intra-layer hierarchical dimension to neural networks, yielding 3D networks with width $W$, depth $K$, and height $H$. It provides bound-based and constructive evidence that height dramatically increases expressivity, achieving exponentially more linear pieces than 2D counterparts and improving polynomial-approximation rates to $\mathcal{O}(2^{-2WK})$ without extra parameters. The authors develop a comprehensive theoretical framework (including upper-bounds and tightness results) and demonstrate practical gains through extensive experiments across synthetic, tabular, and image datasets, showing that 3D networks can outperform traditional 2D networks with similar resources. The findings suggest that intra-layer height is a powerful, computationally economical augmentation that broadens the design space for foundational architectures and potentially enhances performance on a wide range of tasks.
Abstract
In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $H\times W$ and depth $K$, \textit{i.e.}, $\mathcal{O}((2^H-1)W)^K)$ vs $\mathcal{O}((HW)^K)$, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width $\mathcal{O}(W)$ and depth $\mathcal{O}(K)$ can approximate polynomials in $[0,1]^d$ with error $\mathcal{O}\left(2^{-2WK}\right)$, which improves $\mathcal{O}\left(W^{-K}\right)$ and $\mathcal{O}\left(2^{-K}\right)$ for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.
