Table of Contents
Fetching ...

From Density Matrices to Phase Transitions in Deep Learning: Spectral Early Warnings and Interpretability

Max Hennick, Guillaume Corlouer

Abstract

A key problem in the modern study of AI is predicting and understanding emergent capabilities in models during training. Inspired by methods for studying reactions in quantum chemistry, we present the ``2-datapoint reduced density matrix". We show that this object provides a computationally efficient, unified observable of phase transitions during training. By tracking the eigenvalue statistics of the 2RDM over a sliding window, we derive two complementary signals: the spectral heat capacity, which we prove provides early warning of second-order phase transitions via critical slowing down, and the participation ratio, which reveals the dimensionality of the underlying reorganization. Remarkably, the top eigenvectors of the 2RDM are directly interpretable making it straightforward to study the nature of the transitions. We validate across four distinct settings: deep linear networks, induction head formation, grokking, and emergent misalignment. We then discuss directions for future work using the 2RDM.

From Density Matrices to Phase Transitions in Deep Learning: Spectral Early Warnings and Interpretability

Abstract

A key problem in the modern study of AI is predicting and understanding emergent capabilities in models during training. Inspired by methods for studying reactions in quantum chemistry, we present the ``2-datapoint reduced density matrix". We show that this object provides a computationally efficient, unified observable of phase transitions during training. By tracking the eigenvalue statistics of the 2RDM over a sliding window, we derive two complementary signals: the spectral heat capacity, which we prove provides early warning of second-order phase transitions via critical slowing down, and the participation ratio, which reveals the dimensionality of the underlying reorganization. Remarkably, the top eigenvectors of the 2RDM are directly interpretable making it straightforward to study the nature of the transitions. We validate across four distinct settings: deep linear networks, induction head formation, grokking, and emergent misalignment. We then discuss directions for future work using the 2RDM.

Paper Structure

This paper contains 44 sections, 15 theorems, 93 equations, 16 figures, 11 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $\rho_{\theta^0}$ be a distribution about a given parameter $\theta^0$. The loss covariance under $\rho_{\theta^0}$ is approximated to leading order where is the covariance of the distribution and $G \in \mathbb{R}^{n \times p}$ is the Jacobian matrix with rows $g_i^\top = \nabla_\theta \ell(x_i)^\top$ for $\dim(\theta)=p$ with $n$ samples.

Figures (16)

  • Figure 1: A schematic for using the 2RDM. Losses for 60 probes in three groups are given by $\ell(t) = a(t)\,\mathbf{v}_1 + \boldsymbol{\varepsilon}_t$, with $\mathbf{v}_1$ supported only on Group B. (A) Per-probe losses; the black rectangle marks the sliding window used to estimate the covariance. (B) Covariance snapshots: block structure in the Group B sub-matrix appears only during the transition. (C) The spectral heat capacity (red) spikes before the structural coordinate (blue) completes the transition. (D) The top eigenvector of C at the SHC peak localizes on Group B, identifying the participating probes without prior knowledge of the group structure.
  • Figure 2: The average lag (time between the SHC spike and the mode alignment) computed across 30 random seeds. Note the slight increase near the far right. This corresponds to the alignment of another nearby mode.
  • Figure 3: Participation ratio and SHC for a single DLN training run.
  • Figure 4: These plots show that the SHC consistently early detects the induction head formation.
  • Figure 5: The block energy between the random and structured blocks.
  • ...and 11 more figures

Theorems & Definitions (29)

  • Definition 2.1: 2-datapoint reduced density matrix (2RDM)
  • Proposition 2.1
  • Definition 2.2: Spectral Heat Capacity
  • Proposition 2.2: Subspace Alignment (informal)
  • Lemma B.1: Weight-Space Cluster Expansion
  • proof
  • Proposition C.1
  • proof
  • Corollary C.1
  • Corollary C.2
  • ...and 19 more