Genus expansion for non-linear random matrix ensembles with applications to neural networks
Nicola Muca Cirone, Jad Hamdan, Cristopher Salvi
TL;DR
The paper develops a unified genus-expansion framework to analyse non-linear random matrix ensembles arising from randomly initialised neural networks, by introducing a graphical language of operator and product graphs that linearises activations. It proves a Gaussian process limit for sparse networks, quantifies the NTK convergence rate to a deterministic limit, and computes the Jacobian’s spectral moments via non-crossing partitions, with extensions to non-Gaussian, sparse, and complex weights. The key methodological contributions are the tree-based neural network expansions, Wick’s theorem-enabled genus expansion, and the connection of Jacobian moments to Fuss–Catalan-type recursions, which together yield a versatile toolkit for first-order universal limits in deep networks. The results have practical implications for initialization design and understanding training dynamics in wide networks, including sparse or non-Gaussian weight regimes.
Abstract
We present a unified approach to studying certain non-linear random matrix ensembles and associated random neural networks at initialization. This begins with a novel series expansion for neural networks which generalizes Faá di Bruno's formula to an arbitrary number of compositions. The role of monomials is played by random multilinear maps indexed by directed graphs, whose edges correspond to random matrices. Crucially, this expansion linearizes the effect of the activation functions, allowing for the direct application of Wick's principle and the genus expansion technique. As an application, we prove several results about neural networks with random weights. We first give a new proof of the fact that they converge to Gaussian processes as their width tends to infinity. Secondly, we quantify the rate of convergence of the Neural Tangent Kernel to its deterministic limit in Frobenius norm. Finally, we compute the moments of the limiting spectral distribution of the Jacobian (only the first two of which were previously known), expressing them as sums over non-crossing partitions. All of these results are then generalised to the case of neural networks with sparse and non-Gaussian weights, under moment assumptions.
