On Expressivity of Height in Neural Networks

Feng-Lei Fan; Ze-Yu Li; Huan Xiong; Tieyong Zeng

On Expressivity of Height in Neural Networks

Feng-Lei Fan, Ze-Yu Li, Huan Xiong, Tieyong Zeng

TL;DR

This work introduces height as an intra-layer hierarchical dimension to neural networks, yielding 3D networks with width $W$, depth $K$, and height $H$. It provides bound-based and constructive evidence that height dramatically increases expressivity, achieving exponentially more linear pieces than 2D counterparts and improving polynomial-approximation rates to $\mathcal{O}(2^{-2WK})$ without extra parameters. The authors develop a comprehensive theoretical framework (including upper-bounds and tightness results) and demonstrate practical gains through extensive experiments across synthetic, tabular, and image datasets, showing that 3D networks can outperform traditional 2D networks with similar resources. The findings suggest that intra-layer height is a powerful, computationally economical augmentation that broadens the design space for foundational architectures and potentially enhances performance on a wide range of tasks.

Abstract

In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $H\times W$ and depth $K$, \textit{i.e.}, $\mathcal{O}((2^H-1)W)^K)$ vs $\mathcal{O}((HW)^K)$, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width $\mathcal{O}(W)$ and depth $\mathcal{O}(K)$ can approximate polynomials in $[0,1]^d$ with error $\mathcal{O}\left(2^{-2WK}\right)$, which improves $\mathcal{O}\left(W^{-K}\right)$ and $\mathcal{O}\left(2^{-K}\right)$ for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.

On Expressivity of Height in Neural Networks

TL;DR

This work introduces height as an intra-layer hierarchical dimension to neural networks, yielding 3D networks with width

, depth

, and height

. It provides bound-based and constructive evidence that height dramatically increases expressivity, achieving exponentially more linear pieces than 2D counterparts and improving polynomial-approximation rates to

without extra parameters. The authors develop a comprehensive theoretical framework (including upper-bounds and tightness results) and demonstrate practical gains through extensive experiments across synthetic, tabular, and image datasets, showing that 3D networks can outperform traditional 2D networks with similar resources. The findings suggest that intra-layer height is a powerful, computationally economical augmentation that broadens the design space for foundational architectures and potentially enhances performance on a wide range of tasks.

Abstract

, depth

, and height

has greater expressive power than a 2D network of width

and depth

, \textit{i.e.},

, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width

and depth

can approximate polynomials in

with error

, which improves

and

for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.

Paper Structure (16 sections, 15 theorems, 30 equations, 15 figures, 9 tables)

This paper contains 16 sections, 15 theorems, 30 equations, 15 figures, 9 tables.

INTRODUCTION
RELATED WORK
Notation and Definition
Approximation Mechanism of height
Height Can Greatly Increase the Number of Pieces
Upper Bound Estimation
Tightness of Bounds.
Height Can Greatly Improve The Approximation Rate
Experiments
Regression
Results on Synthetic Datasets
Results on Real-world Datasets
Classification
Tabular Datasets
Image Datasets
...and 1 more sections

Key Result

Lemma 4

Let $g: \mathbb{R} \rightarrow \mathbb{R}$ be a PWL function with $w+1$ pieces, then the breakpoints of $f:= \sigma(g)$ consist of two parts: some old breakpoints of $g$ and at most $w+1$ newly produced breakpoints. Furthermore, $f$ has $w+1$ new breakpoints if and only if $g$ has $w+1$ distinct zer

Figures (15)

Figure 1: (a) A 2D network characterized by width and depth. (b) A 3D network characterized by width, depth, and height.
Figure 2: 2D-3D transformation via the intra-layer links.
Figure 3: Two types of horizontally and vertically uniform networks are used in this paper.
Figure 4: Differences of height and depth in accomplishing higher approximation power in terms of the mechanism of generating more pieces, the number of (affine transforms, activation), and function classes.
Figure 5: Construction of PWL functions to reach the bound of Proposition \ref{['tight_bound_1']} when $w_{1}=3$, $w_{2}=2$.
...and 10 more figures

Theorems & Definitions (32)

Definition 1: Width and depth of 2D networks arora2016understanding
Definition 2: Width, depth, and height of 3D networks
Definition 3: 2D-3D Transformation, $\mathcal{N}_{W\times H, K}^2 \to \mathcal{N}_{W,K,H}^3$
Lemma 4
proof
Theorem 5: Upper bound of 2D networks $\mathcal{N}_{W,K}^2$
proof
Lemma 6: A corollary of Lemma \ref{['old_new']}
Theorem 7: Upper bound of three dimensional networks $\mathcal{N}_{W,K,H}^3$
proof
...and 22 more

On Expressivity of Height in Neural Networks

TL;DR

Abstract

On Expressivity of Height in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (32)