Table of Contents
Fetching ...

Global law of conjugate kernel random matrices with heavy-tailed weights

Alice Guionnet, Vanessa Piccolo

TL;DR

The paper extends the spectral analysis of conjugate kernel random matrices $M=YY^{ op}$ with $Y=f(WX)$ to heavy-tailed weight matrices $W$, showing that the limiting eigenvalue distribution is nonuniversal and depends on the tail index and input law. By deploying a traffic-probability framework and a refined moment method, the authors derive explicit limiting moments $m_k$ expressed through combinatorial graph functionals $C_d(f)$ and $C_{(W_k)}(f)$, and they identify which graph configurations contribute via admissible block-tree structures. In the Gaussian-weight special case, the results recover known universality and reduce to familiar expressions; in general heavy-tailed settings, the limiting law remains light-tailed but nonuniversal, reflecting strong dependencies induced by the activation and heavy-tailed weights. The work also establishes almost-sure convergence of the empirical spectral measure under mild growth assumptions, highlighting the practical impact for understanding spectral properties of nonlinear random feature models with heavy-tailed weights.

Abstract

We study the asymptotic spectral distribution of the conjugate kernel random matrix $YY^\top$, where $Y= f(WX)$ arises from a two-layer neural network model. We consider the setting where $W$ and $X$ are random rectangular matrices with i.i.d.\ entries, where the entries of $W$ follow a heavy-tailed distribution, while those of $X$ have light tails. Our assumptions on $W$ include a broad class of heavy-tailed distributions, such as symmetric $α$-stable laws with $α\in ]0,2[$ and sparse matrices with $\mathcal{O}(1)$ nonzero entries per row. The activation function $f$, applied entrywise, is bounded, smooth, odd, and nonlinear. We compute the limiting eigenvalue distribution of $YY^\top$ through its moments and show that heavy-tailed weights induce strong correlations between the entries of $Y$, resulting in richer and fundamentally different spectral behavior compared to the light-tailed case.

Global law of conjugate kernel random matrices with heavy-tailed weights

TL;DR

The paper extends the spectral analysis of conjugate kernel random matrices with to heavy-tailed weight matrices , showing that the limiting eigenvalue distribution is nonuniversal and depends on the tail index and input law. By deploying a traffic-probability framework and a refined moment method, the authors derive explicit limiting moments expressed through combinatorial graph functionals and , and they identify which graph configurations contribute via admissible block-tree structures. In the Gaussian-weight special case, the results recover known universality and reduce to familiar expressions; in general heavy-tailed settings, the limiting law remains light-tailed but nonuniversal, reflecting strong dependencies induced by the activation and heavy-tailed weights. The work also establishes almost-sure convergence of the empirical spectral measure under mild growth assumptions, highlighting the practical impact for understanding spectral properties of nonlinear random feature models with heavy-tailed weights.

Abstract

We study the asymptotic spectral distribution of the conjugate kernel random matrix , where arises from a two-layer neural network model. We consider the setting where and are random rectangular matrices with i.i.d.\ entries, where the entries of follow a heavy-tailed distribution, while those of have light tails. Our assumptions on include a broad class of heavy-tailed distributions, such as symmetric -stable laws with and sparse matrices with nonzero entries per row. The activation function , applied entrywise, is bounded, smooth, odd, and nonlinear. We compute the limiting eigenvalue distribution of through its moments and show that heavy-tailed weights induce strong correlations between the entries of , resulting in richer and fundamentally different spectral behavior compared to the light-tailed case.

Paper Structure

This paper contains 15 sections, 22 theorems, 226 equations, 7 figures.

Key Result

Theorem 1.4

Under Assumptions hyp1-hyp3, for every integer $k \in \mathbb{N}$, there exists a real number $m_k$, depending only on $\phi,\psi,f, \Phi$, and $\nu_x$, such that where the convergence holds both in expectation and in probability. The limiting moment $m_k$ is given explicitly in Proposition prop: main cycle.

Figures (7)

  • Figure 1: Eigenvalue histogram of $M=Y_mY_m^\top$ for the activation function $f(x) = \arctan(x)$. The weight distribution $\nu_w$ follows a symmetric $\alpha$-stable distribution with $\sigma = 1$ and different values of $\alpha \in ]0,2]$, while $\nu_x$ is the standard normal distribution. Numerical experiments were conducted with $m=n=10000$ and $p=6500$.
  • Figure 2: An example of a connected bipartite graph $G = (W \cup V, E)$, together with three distinct collections of subgraphs obtained by choosing subsets $W_1, \ldots, W_K$ of $W$ for $K \in \{2,3\}$, as described in Definition \ref{['def: subgraphs']}.
  • Figure 3: An example of an admissible graph $G = (W \cup V, E)$ with three blocks. Block $B_1 = G(\{w_1\})$ is a simple cycle of length $2$; block $B_2 = G(\{w_2, w_3, w_4, w_5, w_6\})$ is a cactus graph consisting of two simple cycles connected at the vertex $w_3$; and block $B_3 = G(\{w_7\})$ is a double tree. The separating vertices are $v_1$ and $v_5$. In particular, $G$ is a cactus graph. Block $B_2$ admits two admissible decompositions: $\{\{w_2, w_3, w_4, w_5, w_6\}\} \in \mathcal{A}_1(\{w_2, w_3, w_4, w_5, w_6\})$ and $\{\{w_2, w_3, w_4, w_5\} , \{w_3, w_6\} \} \in \mathcal{A}_2(\{w_2, w_3, w_4, w_5, w_6\})$.
  • Figure 4: A simple bipartite cycle of length $12$.
  • Figure 5: The graph $G^\pi$ obtained from the simple cycle $G$ in Figure \ref{['fig3']}, associated with the noncrossing partition $\pi=\{\{v_1,v_5,v_6\},\{v_2,v_4\},\{v_3\}\}$. The vertices $\tilde{v}_1$ and $\tilde{v}_2$ are obtained by merging $v_1,v_5,v_6$ and $v_2,v_4$, respectively. The subsets $W_1^\pi=\{w_1,w_5\}$, $W_2^\pi=\{w_3,w_4\}$, and $W_3^\pi=\{w_6\}$ form the finest partition of $W^\pi$.
  • ...and 2 more figures

Theorems & Definitions (72)

  • Theorem 1.4: Convergence of matrix moments
  • Theorem 1.6: Global law
  • Definition 1.7: Traffic trace male2020
  • Theorem 1.8: Convergence in traffic distribution
  • Definition 2.1: Induced subgraphs
  • Example 2.2
  • Definition 2.3: Block structure of connected bipartite multigraphs
  • Remark 2.4
  • Definition 2.5: Cactus graph and double tree
  • Example 2.6: Example \ref{['ex1']} continued
  • ...and 62 more