Functional dimension of feedforward ReLU neural networks

J. Elisenda Grigsby; Kathryn Lindsey; Robert Meyerhoff; Chenxi Wu

Functional dimension of feedforward ReLU neural networks

J. Elisenda Grigsby, Kathryn Lindsey, Robert Meyerhoff, Chenxi Wu

TL;DR

This work studies the mapping from neural network parameters to realized ReLU functions via the realization map $\rho$, introducing the central notion of functional dimension $\mathrm{dim}_{\mathrm{fun}}(\theta)$ to quantify local parameter-space redundancy. It shows that, due to positive-dimensional input/output scaling symmetries, the functional dimension is generally strictly smaller than the full parametric dimension and varies across parameter space, i.e., it is inhomogeneous. The authors connect functional dimension to the Neural Tangent Kernel and its batch variant, derive an upper bound, and prove tightness for narrowing architectures, while also analyzing combinatorial stability, canonical polyhedral complexes, and the structure of fibers and symmetries. The results illuminate how parameter-space geometry shapes identifiability and gradient-descent dynamics in overparameterized ReLU networks, with implications for training and generalization.

Abstract

It is well-known that the parameterized family of functions representable by fully-connected feedforward neural networks with ReLU activation function is precisely the class of piecewise linear functions with finitely many pieces. It is less well-known that for every fixed architecture of ReLU neural network, the parameter space admits positive-dimensional spaces of symmetries, and hence the local functional dimension near any given parameter is lower than the parametric dimension. In this work we carefully define the notion of functional dimension, show that it is inhomogeneous across the parameter space of ReLU neural network functions, and continue an investigation - initiated in [14] and [5] - into when the functional dimension achieves its theoretical maximum. We also study the quotient space and fibers of the realization map from parameter space to function space, supplying examples of fibers that are disconnected, fibers upon which functional dimension is non-constant, and fibers upon which the symmetry group acts non-transitively.

Functional dimension of feedforward ReLU neural networks

TL;DR

This work studies the mapping from neural network parameters to realized ReLU functions via the realization map

, introducing the central notion of functional dimension

to quantify local parameter-space redundancy. It shows that, due to positive-dimensional input/output scaling symmetries, the functional dimension is generally strictly smaller than the full parametric dimension and varies across parameter space, i.e., it is inhomogeneous. The authors connect functional dimension to the Neural Tangent Kernel and its batch variant, derive an upper bound, and prove tightness for narrowing architectures, while also analyzing combinatorial stability, canonical polyhedral complexes, and the structure of fibers and symmetries. The results illuminate how parameter-space geometry shapes identifiability and gradient-descent dynamics in overparameterized ReLU networks, with implications for training and generalization.

Abstract

Paper Structure (31 sections, 47 theorems, 174 equations, 1 figure)

This paper contains 31 sections, 47 theorems, 174 equations, 1 figure.

Introduction
Related work
Setup and background
Feedforward ReLU neural networks, associated spaces, and the realization map
Background on general polyhedral complexes
The canonical polyhedral complex, generic and transversal neural networks
Ternary labeling
Rank of a smooth map
Smoothness of the parameterized family $\mathcal{F}_{n_0,\ldots,n_m}$
The parameterized family is finitely piecewise polynomial
Parametrically smooth points
Definitions and examples of functional dimension
Stochastic functional dimension
Batch functional dimension
Functional dimension
...and 16 more sections

Key Result

Lemma 2.8

Let be a ReLU neural network, where $F^{i}$ has associated affine-linear map represented by the matrix $A^{i} \in M_{n_{i} \times (n_{i-1}+1)}$, and let $x = x^{0} \in \mathbb{R}^{n_0}$ be an input vector in the interior of a cell $C$ with associated ternary tuple $s_C = \left(s_x^{1}, \ldots, s_x^{m}\r where $x^{\ell}$ is as defined in eq:xell.

Figures (1)

Figure 1: For $\theta_0 = (2,-5,-1,4,1,1,1)$, the function $\rho(\theta_0)$ has the form $\rho(\theta_0)(x) = \sigma(\sigma(2x-5) + \sigma(-x+4)+1)$.

Theorems & Definitions (142)

Example 1.1
Definition 2.1
Remark 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Remark 2.6
Definition 2.7
Lemma 2.8
Lemma 2.9
...and 132 more

Functional dimension of feedforward ReLU neural networks

TL;DR

Abstract

Functional dimension of feedforward ReLU neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (142)