The Emergence of Spectral Universality in Deep Networks

Jeffrey Pennington; Samuel S. Schoenholz; Surya Ganguli

The Emergence of Spectral Universality in Deep Networks

Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

TL;DR

The paper develops a free-probability framework to exactly characterize the full input-output Jacobian spectrum of deep networks at initialization. By deriving a master equation that ties the spectrum to transforms of the nonlinearity and the weight ensemble, it reveals how depth, nonlinearities, and initialization jointly shape spectral concentration around unity. It uncovers two universal limiting spectral laws—Bernoulli-like and smooth—emerging under a double-scaling regime with orthogonal weights, and shows that orthogonality is essential for stable universality. These results provide principled guidance for choosing nonlinearities and weight preparations to achieve dynamical isometry and fast learning in very deep networks.

Abstract

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.

The Emergence of Spectral Universality in Deep Networks

TL;DR

Abstract

The Emergence of Spectral Universality in Deep Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)