Spectral complexity of deep neural networks

Simmaco Di Lillo; Domenico Marinucci; Michele Salvi; Stefano Vigogna

Spectral complexity of deep neural networks

Simmaco Di Lillo, Domenico Marinucci, Michele Salvi, Stefano Vigogna

TL;DR

This work introduces a spectral framework for understanding depth in neural networks by studying the angular power spectrum of the infinite-width limit treated as isotropic random fields on the sphere. It defines a depth-dependent spectral law $X_L$ with distribution $P(X_L=\ell)=D_{\ell;\kappa_L}$ and classifies architectures into low-disorder, sparse, or high-disorder regimes based on $\kappa'(1)$. The authors prove regime-dependent asymptotics for spectral moments and the limiting behavior of the random fields and their derivatives, showing that ReLU networks exhibit a sparse, low-frequency structure with high Sobolev energy, while tanh-type networks become increasingly oscillatory; they also introduce spectral effective support and dimension as practical complexity measures. Numerical experiments with Monte Carlo simulations and Healpix corroborate the theory, revealing sharp differences across activation functions and depth. The results offer a principled, depth-aware notion of complexity and point to future directions in geometry of random fields, finite-width-depth regimes, and extensions to convolutional architectures.

Abstract

It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.

Spectral complexity of deep neural networks

TL;DR

with distribution

and classifies architectures into low-disorder, sparse, or high-disorder regimes based on

. The authors prove regime-dependent asymptotics for spectral moments and the limiting behavior of the random fields and their derivatives, showing that ReLU networks exhibit a sparse, low-frequency structure with high Sobolev energy, while tanh-type networks become increasingly oscillatory; they also introduce spectral effective support and dimension as practical complexity measures. Numerical experiments with Monte Carlo simulations and Healpix corroborate the theory, revealing sharp differences across activation functions and depth. The results offer a principled, depth-aware notion of complexity and point to future directions in geometry of random fields, finite-width-depth regimes, and extensions to convolutional architectures.

Abstract

Paper Structure (14 sections, 10 theorems, 102 equations, 2 figures, 4 tables)

This paper contains 14 sections, 10 theorems, 102 equations, 2 figures, 4 tables.

Introduction
Background and notation
Isotropic random fields on the sphere
Random neural networks
Main results
Idea of the proofs
Numerical evidence
Conclusions and future work
Proofs
On the link between kernel derivatives and spectral moments
k'(1) not 1
k'(1) equal to 1
Proof of main theorems
Kernels associated to the Gaussian and the hyperbolic tangent

Key Result

Theorem 3.1

Let $\kappa:[-1,1]\to\mathbb R$ given by $\mathbb E[T_1(x)T_1(y)] = \kappa(\langle x, y \rangle)$. We assume that $\kappa\in C^1$.

Figures (2)

Figure 1: Mollweide projection of a random neural network $T_L: S^2 \to \mathbb{R}$ with varying depth $L=1,20$ (from top to bottom). The activation functions are $\sigma_1(x) = e^{-x^2/2}$, $\sigma_2(x) = \max(0,x)$ and $\sigma_3 = \tanh(x)$ (from left to right). The size of hidden layers is fixed at $1000$ neurons and the resolution of the map is $0.11$ deg. The fields were obtained estimating the angular spectrum by a Monte Carlo estimation (1000 samples) and drawing one realization of the random spherical harmonic coefficients. Note that the color ranges are different from plot to plot. See also \ref{['tab::max_min']}, which displays the range of values assumed by the fields; in the plot, the values of the field are approximated to the 3rd decimal digit.
Figure 2: Same as \ref{['fig:3casi']}: ReLU on the left, tanh on the right. The maps corresponding to Gaussian activations are dropped because they are approximately constant on the sphere.

Theorems & Definitions (26)

Theorem 3.1
Remark 3.2
Theorem 3.3
Remark 3.4
Definition 3.5
Remark 3.6: Interpretation of spectral support and dimension
Remark 3.7: Spectral complexity
Remark 3.8: Spikes in ReLU networks
Proposition A.1
proof
...and 16 more

Spectral complexity of deep neural networks

TL;DR

Abstract

Spectral complexity of deep neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (26)