Table of Contents
Fetching ...

Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects

Giuseppe Alessio D'Inverno, Zhiyuan Hu, Leo Davy, Michael Unser, Gianluigi Rozza, Jonathan Dong

TL;DR

Problem: how information propagates in finite-width deep networks and how the order-to-chaos boundary behaves away from the mean-field limit. Approach: extend mean-field analysis to finite width across MLPs, CNNs, and Fourier-based structured transforms, define forward divergence $L^{(d)}$ and backpropagation metric $L'$ and study the fractal frontier via box counting. Contributions: show a fractal boundary between stable and chaotic propagation in forward and backward passes, quantify fractal dimensions around 1.6–1.9 across architectures, and demonstrate a clear finite-depth separation-robustness tradeoff. Significance: reveals intrinsic dynamical complexity of architectures independent of data and optimization, informing initialization and depth choices for robust information flow in practice.

Abstract

Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. In practice, our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness.

Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects

TL;DR

Problem: how information propagates in finite-width deep networks and how the order-to-chaos boundary behaves away from the mean-field limit. Approach: extend mean-field analysis to finite width across MLPs, CNNs, and Fourier-based structured transforms, define forward divergence and backpropagation metric and study the fractal frontier via box counting. Contributions: show a fractal boundary between stable and chaotic propagation in forward and backward passes, quantify fractal dimensions around 1.6–1.9 across architectures, and demonstrate a clear finite-depth separation-robustness tradeoff. Significance: reveals intrinsic dynamical complexity of architectures independent of data and optimization, informing initialization and depth choices for robust information flow in practice.

Abstract

Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. In practice, our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness.

Paper Structure

This paper contains 12 sections, 13 equations, 7 figures.

Figures (7)

  • Figure 1: Graphical abstract. (Left) Information propagation experiment. Two different inputs are fed in a feedforward network and the divergence metric is defined as the distance between pre-activations. (Right) Fractal behavior of the information propagation landscape. The divergence metric $L^{(D)}$ as a function of $\sigma_w$ and $\sigma_b$ (at depth $D = 10^3$ for a CNN of size $N = 100$ with $\operatorname{erf}$ activation function) exhibits a fractal structure for the boundary between stability and information propagation.
  • Figure 2: Universality of the frontier between stability and information propagation. Divergence metric $L^{(D)}$ as a function of $\sigma_w$ and $\sigma_b$ for random fully-connected (MLP), random convolutional (CNN), or structured random of the form $W = FDF$ or $W = FDFD$ with Fourier transforms and random diagonal matrices. This has been computed for depth $D=10^3$ and width $N=10^3$.
  • Figure 3: Convergence towards the mean-field limit of the different architectures. Examples of information propagation landscape for different widths, from small ($N=20$) to large ($N = 10^3$), and different architectures (MLP, CNN, structured random).
  • Figure 4: Zoom-in on the boundary between stability and information propagation. Sequence of 4 images of the divergence metric $L^{(D)}$ as a function of $\sigma_w$ and $\sigma_b$, with increasingly smaller range, computed for an MLP of depth $D = 10^3$ and size $N = 10^2$.
  • Figure 5: a) Summary of the fractal analysis of the boundary for the divergence metric $L^{(D)}$ as a function of $\sigma_w$ and $\sigma_b$, with increasingly smaller range, computed for an MLP. Left : box size vs box counting in $\log x - \log y$ space. The estimated fractal dimension is the slope of the linear regressor, for each zooming. Right: the variation of the fractal dimension according to different thresholds $\tau$ is shown for each zoomed-in image. The threshold for the maximum fractal dimension value obtained is then chosen to carry on the analysis. The color code for each plot follows the one in Fig. \ref{['fig:finite-size-fractals']}, with blue corresponding to the most zoomed-out image. b) Estimation of the fractal dimension for a CNN. c) Estimation of the fractal dimension for a structured random of the form $FDF$.
  • ...and 2 more figures