Exploring the Complexity of Deep Neural Networks through Functional Equivalence

Guohao Shen

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

Guohao Shen

TL;DR

This work studies how functional equivalence among parameterizations creates redundancy in deep neural networks and derives new, tighter bounds on the covering number by exploiting permutation invariance. The authors develop representative parameter sets that factor out symmetry and show that the effective parameter space volume shrinks by a factorial factor $d_1!\cdots d_L!$, with explicit bounds expressed through network width, depth, and parameter norms. They extend the analysis to CNNs, ResNets, and attention modules, and connect these bounds to improved generalization and easier optimization in overparameterized regimes by reducing the estimation error term and increasing the likelihood of convergent solutions. The results offer a principled account of overparameterization and provide practical insights into capacity control and training dynamics in modern architectures.

Abstract

We investigate the complexity of deep neural networks through the lens of functional equivalence, which posits that different parameterizations can yield the same network function. Leveraging the equivalence property, we present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. Additionally, we demonstrate that functional equivalence benefits optimization, as overparameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space. These findings can offer valuable insights into the phenomenon of overparameterization and have implications for understanding generalization and optimization in deep learning.

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

TL;DR

, with explicit bounds expressed through network width, depth, and parameter norms. They extend the analysis to CNNs, ResNets, and attention modules, and connect these bounds to improved generalization and easier optimization in overparameterized regimes by reducing the estimation error term and increasing the likelihood of convergent solutions. The results offer a principled account of overparameterization and provide practical insights into capacity control and training dynamics in modern architectures.

Abstract

Paper Structure (19 sections, 8 theorems, 86 equations, 2 tables)

This paper contains 19 sections, 8 theorems, 86 equations, 2 tables.

Introduction
Related work
Our contributions
Functionally equivalent Neural Networks
Shallow Feed-Forward Neural Networks
Deep Feed-Forward Neural Networks
Comparing to existing results
Extension to other neural networks
Convolutional neural networks
Residual Networks
Attention-based Networks
Implications to generalization and optimization
Conclusion
Proof of Theorems
Proof of Theorem \ref{['thm_perm']}
...and 4 more sections

Key Result

Proposition 1

Consider two neural networks $f(x;\theta_1)$ and $f(x;\theta_2)$ with the same activations $\sigma_1,\ldots,\sigma_L$ and architecture but parameterized by different parameters respectively, where $x \in \mathbb{R}^n$ is the input to the network. Let $P^\top$ denote the transpose of matrix $P$. If there exists permutation matrices $P_1,\ldots,P_{L}$ such that then $f(x;\theta_1)$ and $f(x;\thet

Theorems & Definitions (33)

Definition 1: Functionally-Equivalent Neural Networks
Example 1: Scaling
Example 2: Sign Flipping
Example 3: Permutation
Proposition 1: Permutation equivalence for deep FNNs
Definition 2: Covering Number
Remark 1
Theorem 1: Covering number of shallow neural networks
Remark 2
Theorem 2: Covering number of deep neural networks
...and 23 more

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

TL;DR

Abstract

Exploring the Complexity of Deep Neural Networks through Functional Equivalence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (33)