Table of Contents
Fetching ...

Functional dimension of feedforward ReLU neural networks

J. Elisenda Grigsby, Kathryn Lindsey, Robert Meyerhoff, Chenxi Wu

TL;DR

This work studies the mapping from neural network parameters to realized ReLU functions via the realization map $\rho$, introducing the central notion of functional dimension $\mathrm{dim}_{\mathrm{fun}}(\theta)$ to quantify local parameter-space redundancy. It shows that, due to positive-dimensional input/output scaling symmetries, the functional dimension is generally strictly smaller than the full parametric dimension and varies across parameter space, i.e., it is inhomogeneous. The authors connect functional dimension to the Neural Tangent Kernel and its batch variant, derive an upper bound, and prove tightness for narrowing architectures, while also analyzing combinatorial stability, canonical polyhedral complexes, and the structure of fibers and symmetries. The results illuminate how parameter-space geometry shapes identifiability and gradient-descent dynamics in overparameterized ReLU networks, with implications for training and generalization.

Abstract

It is well-known that the parameterized family of functions representable by fully-connected feedforward neural networks with ReLU activation function is precisely the class of piecewise linear functions with finitely many pieces. It is less well-known that for every fixed architecture of ReLU neural network, the parameter space admits positive-dimensional spaces of symmetries, and hence the local functional dimension near any given parameter is lower than the parametric dimension. In this work we carefully define the notion of functional dimension, show that it is inhomogeneous across the parameter space of ReLU neural network functions, and continue an investigation - initiated in [14] and [5] - into when the functional dimension achieves its theoretical maximum. We also study the quotient space and fibers of the realization map from parameter space to function space, supplying examples of fibers that are disconnected, fibers upon which functional dimension is non-constant, and fibers upon which the symmetry group acts non-transitively.

Functional dimension of feedforward ReLU neural networks

TL;DR

This work studies the mapping from neural network parameters to realized ReLU functions via the realization map , introducing the central notion of functional dimension to quantify local parameter-space redundancy. It shows that, due to positive-dimensional input/output scaling symmetries, the functional dimension is generally strictly smaller than the full parametric dimension and varies across parameter space, i.e., it is inhomogeneous. The authors connect functional dimension to the Neural Tangent Kernel and its batch variant, derive an upper bound, and prove tightness for narrowing architectures, while also analyzing combinatorial stability, canonical polyhedral complexes, and the structure of fibers and symmetries. The results illuminate how parameter-space geometry shapes identifiability and gradient-descent dynamics in overparameterized ReLU networks, with implications for training and generalization.

Abstract

It is well-known that the parameterized family of functions representable by fully-connected feedforward neural networks with ReLU activation function is precisely the class of piecewise linear functions with finitely many pieces. It is less well-known that for every fixed architecture of ReLU neural network, the parameter space admits positive-dimensional spaces of symmetries, and hence the local functional dimension near any given parameter is lower than the parametric dimension. In this work we carefully define the notion of functional dimension, show that it is inhomogeneous across the parameter space of ReLU neural network functions, and continue an investigation - initiated in [14] and [5] - into when the functional dimension achieves its theoretical maximum. We also study the quotient space and fibers of the realization map from parameter space to function space, supplying examples of fibers that are disconnected, fibers upon which functional dimension is non-constant, and fibers upon which the symmetry group acts non-transitively.
Paper Structure (31 sections, 47 theorems, 174 equations, 1 figure)

This paper contains 31 sections, 47 theorems, 174 equations, 1 figure.

Key Result

Lemma 2.8

Let be a ReLU neural network, where $F^{i}$ has associated affine-linear map represented by the matrix $A^{i} \in M_{n_{i} \times (n_{i-1}+1)}$, and let $x = x^{0} \in \mathbb{R}^{n_0}$ be an input vector in the interior of a cell $C$ with associated ternary tuple $s_C = \left(s_x^{1}, \ldots, s_x^{m}\r where $x^{\ell}$ is as defined in eq:xell.

Figures (1)

  • Figure 1: For $\theta_0 = (2,-5,-1,4,1,1,1)$, the function $\rho(\theta_0)$ has the form $\rho(\theta_0)(x) = \sigma(\sigma(2x-5) + \sigma(-x+4)+1)$.

Theorems & Definitions (142)

  • Example 1.1
  • Definition 2.1
  • Remark 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Remark 2.6
  • Definition 2.7
  • Lemma 2.8
  • Lemma 2.9
  • ...and 132 more