Genus expansion for non-linear random matrix ensembles with applications to neural networks

Nicola Muca Cirone; Jad Hamdan; Cristopher Salvi

Genus expansion for non-linear random matrix ensembles with applications to neural networks

Nicola Muca Cirone, Jad Hamdan, Cristopher Salvi

TL;DR

The paper develops a unified genus-expansion framework to analyse non-linear random matrix ensembles arising from randomly initialised neural networks, by introducing a graphical language of operator and product graphs that linearises activations. It proves a Gaussian process limit for sparse networks, quantifies the NTK convergence rate to a deterministic limit, and computes the Jacobian’s spectral moments via non-crossing partitions, with extensions to non-Gaussian, sparse, and complex weights. The key methodological contributions are the tree-based neural network expansions, Wick’s theorem-enabled genus expansion, and the connection of Jacobian moments to Fuss–Catalan-type recursions, which together yield a versatile toolkit for first-order universal limits in deep networks. The results have practical implications for initialization design and understanding training dynamics in wide networks, including sparse or non-Gaussian weight regimes.

Abstract

We present a unified approach to studying certain non-linear random matrix ensembles and associated random neural networks at initialization. This begins with a novel series expansion for neural networks which generalizes Faá di Bruno's formula to an arbitrary number of compositions. The role of monomials is played by random multilinear maps indexed by directed graphs, whose edges correspond to random matrices. Crucially, this expansion linearizes the effect of the activation functions, allowing for the direct application of Wick's principle and the genus expansion technique. As an application, we prove several results about neural networks with random weights. We first give a new proof of the fact that they converge to Gaussian processes as their width tends to infinity. Secondly, we quantify the rate of convergence of the Neural Tangent Kernel to its deterministic limit in Frobenius norm. Finally, we compute the moments of the limiting spectral distribution of the Jacobian (only the first two of which were previously known), expressing them as sums over non-crossing partitions. All of these results are then generalised to the case of neural networks with sparse and non-Gaussian weights, under moment assumptions.

Genus expansion for non-linear random matrix ensembles with applications to neural networks

TL;DR

Abstract

Paper Structure (40 sections, 41 theorems, 254 equations, 27 figures, 1 table)

This paper contains 40 sections, 41 theorems, 254 equations, 27 figures, 1 table.

Introduction
Randomly initialised neural networks and non-linear random matrix theory
Main results
Overview of our method.
A graphical language for neural network computations.
Graph expansions of neural networks.
Wick’s principle and genus expansion.
Possible generalizations
Random biases
Non-polynomial activations
Notation and nomenclature
Graphical descriptions of analytic operations
Operator graphs and their associated linear map
Operations on graphs.
Neural network expansions
...and 25 more sections

Key Result

Theorem 1

Let $N_\ell = N$ when $\ell>0$, and assume that each $W_\ell$ has i.i.d. entries drawn from a symmetric, centred distribution with finite moments and variance $\frac{1}{N}\mathbf{1}(\ell>0)+\mathbf{1}(\ell=0)$. Then for any $M,L \geq 1$ we have where the right-hand side is a Gaussian Process indexed on $\mathbb{R}^{N_0}$, with diagonal covariance function defined by and $\mathrm{Id}_M$ is the $M

Figures (27)

Figure 1: A product graph giving rise to a word of matrices multiplied by vectors on either side. $\mathbf{1}_{N}$ is the $N\times 1$ vector of ones.
Figure 2: Trees give rise to Hadamard products.
Figure 3: A simple product graph leading to a more involved analytical expression for its value.
Figure 4: A simple product graph $G$ (top left), from which we can free vertices to make $\mathbf{W}_G$ either a vector, a scalar-valued map, or a vector-valued map (top right, bottom left, and bottom right respectively).
Figure 5: Example of an operator graph (right) obtained from a graph (left) by a choice of $\mathcal{F}_{\mathrm{in}}$, $\mathcal{F}_{\mathrm{out}}$, $\mathfrak{d}$ and $(\mathbf{X}_c)_{c \notin \mathcal{F}}$. Note how on the right, we omit labels from vertices and edges whose inputs are $\mathbf{1}$ or $\mathbf{I}$ respectively.
...and 22 more figures

Theorems & Definitions (109)

Theorem 1: Gaussian process limit for sparse neural networks
proof
Theorem 2: Convergence in $L^2$ of the NTK at intialisation
proof
Theorem 3: Moments of the limiting spectral distribution of the Jacobian
proof
Remark
Definition 1: Product graph
Definition 2
Remark
...and 99 more

Genus expansion for non-linear random matrix ensembles with applications to neural networks

TL;DR

Abstract

Genus expansion for non-linear random matrix ensembles with applications to neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (109)