Table of Contents
Fetching ...

Tuning Universality in Deep Neural Networks

Arsham Ghavasieh

TL;DR

A stochastic theory of deep information propagation (DIP) by incorporating Central Limit Theorem (CLT)-level fluctuations is derived and it is demonstrated that activation function design controls the collective dynamics in random DNNs.

Abstract

Deep neural networks (DNNs) exhibit crackling-like avalanches whose origin lacks a mechanistic explanation. Here, I derive a stochastic theory of deep information propagation (DIP) by incorporating Central Limit Theorem (CLT)-level fluctuations. Four effective couplings $(r, h, D_1, D_2)$ characterize the dynamics, yielding a Landau description of the static exponents and a Directed Percolation (DP) structure of activity cascades. Tuning the couplings selects between avalanche dynamics generated by a Brownian Motion (BM) in a logarithmic trap and an absorbed free BM, each corresponding to a distinct universality classes. Numerical simulations confirm the theory and demonstrate that activation function design controls the collective dynamics in random DNNs.

Tuning Universality in Deep Neural Networks

TL;DR

A stochastic theory of deep information propagation (DIP) by incorporating Central Limit Theorem (CLT)-level fluctuations is derived and it is demonstrated that activation function design controls the collective dynamics in random DNNs.

Abstract

Deep neural networks (DNNs) exhibit crackling-like avalanches whose origin lacks a mechanistic explanation. Here, I derive a stochastic theory of deep information propagation (DIP) by incorporating Central Limit Theorem (CLT)-level fluctuations. Four effective couplings characterize the dynamics, yielding a Landau description of the static exponents and a Directed Percolation (DP) structure of activity cascades. Tuning the couplings selects between avalanche dynamics generated by a Brownian Motion (BM) in a logarithmic trap and an absorbed free BM, each corresponding to a distinct universality classes. Numerical simulations confirm the theory and demonstrate that activation function design controls the collective dynamics in random DNNs.

Paper Structure

This paper contains 14 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Avalanche statistics obtained from stochastic DIP for the two activations $\Phi_{D_1}$ and $\Phi_{D_2}$ defined in Tab. \ref{['tab:tab_1']}. Top row (a–d): $\Phi_{D_1}$, predicted to lie in the DP regime, shows powerlaw size $S$ and duration $D$ distributions with exponents $\tau_s \approx 3/2$ and $\tau_d \approx 2$, a crackling relation $\gamma \approx 2$ --- calculated via fitting average size $\langle S\rangle_D$ vs duration---, and a universal shape collapse. Bottom row (e–h): $\Phi_{D_2}$, predicted to yield Brownian-excursion avalanches, exhibits exponents $(\tau_s, \tau_d, \gamma) \approx (4/3, 3/2, 3/2)$ and the expected parabolic shape collapse. The results confirm that modifying the activation’s Taylor coefficients steers the universality class of random deep networks.