Table of Contents
Fetching ...

On Functional Dimension and Persistent Pseudodimension

J. Elisenda Grigsby, Kathryn Lindsey

TL;DR

This work addresses local complexity in fixed-architecture ReLU networks by introducing two local measures, $dim_{fun}(\theta)$ and $dim_{p.{VC\Delta}}(\mathcal{F},\theta)$, and establishing a fundamental inequality $dim_{fun}(\theta) \le dim_{p.{VC\Delta}}(\mathcal{F},\theta) \le \sup_{Z} r_{\mathbb{R}}(\mathbf{J}E^R_Z(\theta))$ that links parameter redundancy to the real rank of the algebraic Jacobian. It develops a batch-wise, algebraic framework involving the algebraic evaluation map $E_Z^R$, activation matrices $\alpha^R$ and $\alpha$, and a local batch-fiber structure, to derive concrete, locality-aware bounds on the persistent pseudodimension and its relation to functional dimension. The paper shows how the rank gap between polynomial-ring and real-valued Jacobians governs these bounds and discusses overparameterization as a regime where the gap may close, supported by McCoy’s theorem. A key conjecture is that for generic, overparameterized ReLU families, the local complexities coincide, providing a principled explanation for observed generalization behavior and its relation to double descent.

Abstract

For any fixed feedforward ReLU neural network architecture, it is well-known that many different parameter settings can determine the same function. It is less well-known that the degree of this redundancy is inhomogeneous across parameter space. In this work, we discuss two locally applicable complexity measures for ReLU network classes and what we know about the relationship between them: (1) the local functional dimension [14, 18], and (2) a local version of VC dimension that we call persistent pseudodimension. The former is easy to compute on finite batches of points; the latter should give local bounds on the generalization gap, which would inform an understanding of the mechanics of the double descent phenomenon [7].

On Functional Dimension and Persistent Pseudodimension

TL;DR

This work addresses local complexity in fixed-architecture ReLU networks by introducing two local measures, and , and establishing a fundamental inequality that links parameter redundancy to the real rank of the algebraic Jacobian. It develops a batch-wise, algebraic framework involving the algebraic evaluation map , activation matrices and , and a local batch-fiber structure, to derive concrete, locality-aware bounds on the persistent pseudodimension and its relation to functional dimension. The paper shows how the rank gap between polynomial-ring and real-valued Jacobians governs these bounds and discusses overparameterization as a regime where the gap may close, supported by McCoy’s theorem. A key conjecture is that for generic, overparameterized ReLU families, the local complexities coincide, providing a principled explanation for observed generalization behavior and its relation to double descent.

Abstract

For any fixed feedforward ReLU neural network architecture, it is well-known that many different parameter settings can determine the same function. It is less well-known that the degree of this redundancy is inhomogeneous across parameter space. In this work, we discuss two locally applicable complexity measures for ReLU network classes and what we know about the relationship between them: (1) the local functional dimension [14, 18], and (2) a local version of VC dimension that we call persistent pseudodimension. The former is easy to compute on finite batches of points; the latter should give local bounds on the generalization gap, which would inform an understanding of the mechanics of the double descent phenomenon [7].

Paper Structure

This paper contains 21 sections, 26 theorems, 115 equations, 1 figure.

Key Result

Theorem 1

Fix a finite set $Z$ of points in input space $\mathbb{R}^{n_0}$. For any architecture of feedforward ReLU neural networks $\mathbb{R}^{n_0} \to \mathbb{R}$, locally near almost every parameter $\theta$, the parameter space $\Omega \cong \mathbb{R}^D$ is (diffeomorphic to) a product of a $\textrm{di

Figures (1)

  • Figure 1: An augmented computational graph for architecture (2,3,3,1). The ordinary vertices are black, and the distinguished vertices are red. Black edges are labeled with weights, and red edges are labeled with biases. A complete path is one that ends at an output vertex and begins either at an input vertex or at one of the distinguished vertices. In the diagram above, we have blurred out vertices corresponding to inactive neurons associated to an input vector $x$ for some parameter $\theta$. The open paths are then the ones in the diagram above with solid (non-dashed) edges. The reader can check that there are three open, complete paths $\gamma, \gamma', \gamma" \in \Gamma^\theta_{x,*}$, whose monomials are $m(\gamma) = b_3^1W^2_{13}W^3_{11}$, $m(\gamma') = b_1^2W^3_{11}$, and $m(\gamma") = b_1^3$. There is a unique open, complete path $\gamma_1 \in \Gamma^\theta_{x,1}$, with monomial $m(\gamma_1) = W^1_{31}W^2_{13}W^3_{11}$ and a unique open, complete path $\gamma_2 \in \Gamma^\theta_{x,2}$, with monomial $m(\gamma_2) = W^1_{32}W^2_{13}W^3_{11}$.

Theorems & Definitions (95)

  • Theorem : Informal version of Theorem \ref{['t:batchfiberfoliation']}, Batch Fiber Product Structure
  • Definition 2.1: parameterized family
  • Definition 2.2: function, realization map, evaluation map
  • Remark 2.3
  • Definition 2.4: parameterized family of differences
  • Definition 2.5: finitely piecewise smooth/polynomial
  • Definition 2.6
  • Definition 2.7
  • Remark 2.8
  • Remark 2.9
  • ...and 85 more