Table of Contents
Fetching ...

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

Guohao Shen

TL;DR

This work studies how functional equivalence among parameterizations creates redundancy in deep neural networks and derives new, tighter bounds on the covering number by exploiting permutation invariance. The authors develop representative parameter sets that factor out symmetry and show that the effective parameter space volume shrinks by a factorial factor $d_1!\cdots d_L!$, with explicit bounds expressed through network width, depth, and parameter norms. They extend the analysis to CNNs, ResNets, and attention modules, and connect these bounds to improved generalization and easier optimization in overparameterized regimes by reducing the estimation error term and increasing the likelihood of convergent solutions. The results offer a principled account of overparameterization and provide practical insights into capacity control and training dynamics in modern architectures.

Abstract

We investigate the complexity of deep neural networks through the lens of functional equivalence, which posits that different parameterizations can yield the same network function. Leveraging the equivalence property, we present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. Additionally, we demonstrate that functional equivalence benefits optimization, as overparameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space. These findings can offer valuable insights into the phenomenon of overparameterization and have implications for understanding generalization and optimization in deep learning.

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

TL;DR

This work studies how functional equivalence among parameterizations creates redundancy in deep neural networks and derives new, tighter bounds on the covering number by exploiting permutation invariance. The authors develop representative parameter sets that factor out symmetry and show that the effective parameter space volume shrinks by a factorial factor , with explicit bounds expressed through network width, depth, and parameter norms. They extend the analysis to CNNs, ResNets, and attention modules, and connect these bounds to improved generalization and easier optimization in overparameterized regimes by reducing the estimation error term and increasing the likelihood of convergent solutions. The results offer a principled account of overparameterization and provide practical insights into capacity control and training dynamics in modern architectures.

Abstract

We investigate the complexity of deep neural networks through the lens of functional equivalence, which posits that different parameterizations can yield the same network function. Leveraging the equivalence property, we present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. Additionally, we demonstrate that functional equivalence benefits optimization, as overparameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space. These findings can offer valuable insights into the phenomenon of overparameterization and have implications for understanding generalization and optimization in deep learning.
Paper Structure (19 sections, 8 theorems, 86 equations, 2 tables)

This paper contains 19 sections, 8 theorems, 86 equations, 2 tables.

Key Result

Proposition 1

Consider two neural networks $f(x;\theta_1)$ and $f(x;\theta_2)$ with the same activations $\sigma_1,\ldots,\sigma_L$ and architecture but parameterized by different parameters respectively, where $x \in \mathbb{R}^n$ is the input to the network. Let $P^\top$ denote the transpose of matrix $P$. If there exists permutation matrices $P_1,\ldots,P_{L}$ such that then $f(x;\theta_1)$ and $f(x;\thet

Theorems & Definitions (33)

  • Definition 1: Functionally-Equivalent Neural Networks
  • Example 1: Scaling
  • Example 2: Sign Flipping
  • Example 3: Permutation
  • Proposition 1: Permutation equivalence for deep FNNs
  • Definition 2: Covering Number
  • Remark 1
  • Theorem 1: Covering number of shallow neural networks
  • Remark 2
  • Theorem 2: Covering number of deep neural networks
  • ...and 23 more