Table of Contents
Fetching ...

Spectral complexity of deep neural networks

Simmaco Di Lillo, Domenico Marinucci, Michele Salvi, Stefano Vigogna

TL;DR

This work introduces a spectral framework for understanding depth in neural networks by studying the angular power spectrum of the infinite-width limit treated as isotropic random fields on the sphere. It defines a depth-dependent spectral law $X_L$ with distribution $P(X_L=\ell)=D_{\ell;\kappa_L}$ and classifies architectures into low-disorder, sparse, or high-disorder regimes based on $\kappa'(1)$. The authors prove regime-dependent asymptotics for spectral moments and the limiting behavior of the random fields and their derivatives, showing that ReLU networks exhibit a sparse, low-frequency structure with high Sobolev energy, while tanh-type networks become increasingly oscillatory; they also introduce spectral effective support and dimension as practical complexity measures. Numerical experiments with Monte Carlo simulations and Healpix corroborate the theory, revealing sharp differences across activation functions and depth. The results offer a principled, depth-aware notion of complexity and point to future directions in geometry of random fields, finite-width-depth regimes, and extensions to convolutional architectures.

Abstract

It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.

Spectral complexity of deep neural networks

TL;DR

This work introduces a spectral framework for understanding depth in neural networks by studying the angular power spectrum of the infinite-width limit treated as isotropic random fields on the sphere. It defines a depth-dependent spectral law with distribution and classifies architectures into low-disorder, sparse, or high-disorder regimes based on . The authors prove regime-dependent asymptotics for spectral moments and the limiting behavior of the random fields and their derivatives, showing that ReLU networks exhibit a sparse, low-frequency structure with high Sobolev energy, while tanh-type networks become increasingly oscillatory; they also introduce spectral effective support and dimension as practical complexity measures. Numerical experiments with Monte Carlo simulations and Healpix corroborate the theory, revealing sharp differences across activation functions and depth. The results offer a principled, depth-aware notion of complexity and point to future directions in geometry of random fields, finite-width-depth regimes, and extensions to convolutional architectures.

Abstract

It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.
Paper Structure (14 sections, 10 theorems, 102 equations, 2 figures, 4 tables)

This paper contains 14 sections, 10 theorems, 102 equations, 2 figures, 4 tables.

Key Result

Theorem 3.1

Let $\kappa:[-1,1]\to\mathbb R$ given by $\mathbb E[T_1(x)T_1(y)] = \kappa(\langle x, y \rangle)$. We assume that $\kappa\in C^1$.

Figures (2)

  • Figure 1: Mollweide projection of a random neural network $T_L: S^2 \to \mathbb{R}$ with varying depth $L=1,20$ (from top to bottom). The activation functions are $\sigma_1(x) = e^{-x^2/2}$, $\sigma_2(x) = \max(0,x)$ and $\sigma_3 = \tanh(x)$ (from left to right). The size of hidden layers is fixed at $1000$ neurons and the resolution of the map is $0.11$ deg. The fields were obtained estimating the angular spectrum by a Monte Carlo estimation (1000 samples) and drawing one realization of the random spherical harmonic coefficients. Note that the color ranges are different from plot to plot. See also \ref{['tab::max_min']}, which displays the range of values assumed by the fields; in the plot, the values of the field are approximated to the 3rd decimal digit.
  • Figure 2: Same as \ref{['fig:3casi']}: ReLU on the left, tanh on the right. The maps corresponding to Gaussian activations are dropped because they are approximately constant on the sphere.

Theorems & Definitions (26)

  • Theorem 3.1
  • Remark 3.2
  • Theorem 3.3
  • Remark 3.4
  • Definition 3.5
  • Remark 3.6: Interpretation of spectral support and dimension
  • Remark 3.7: Spectral complexity
  • Remark 3.8: Spikes in ReLU networks
  • Proposition A.1
  • proof
  • ...and 16 more