Table of Contents
Fetching ...

Neural Networks as Spin Models: From Glass to Hidden Order Through Training

Richard Barney, Michael Winer, Victor Galitski

TL;DR

This work builds a precise mapping between neural networks and multi-layer Sherrington-Kirkpatrick spin glasses by treating neurons as Ising spins and training weights as spin couplings, thereby representing training as a time-parametrized family of Hamiltonians. Using replica analysis, it identifies a finite-temperature spin-glass to paramagnet transition in the initial state, with a closed-form estimate $T_c=[2C\cos(\pi/(L+2))]^{1/2}$ for identical layers, and employs TAP equations to analyze single instances of trained systems. As training proceeds on MNIST with both binarized and ReLU networks, the critical temperature $T_c(t)$ grows roughly as a power law, and the spectrum of the bond matrix develops tails that signal a transition from a glassy landscape with many TAP solutions to a single hidden-order solution aligned with the training task. The results offer a unifying, physics-based lens on NN training, link rich learning dynamics to spectral features of the weight matrix, and point to potential extensions into quantum-inspired and neuromorphic platforms.

Abstract

We explore a one-to-one correspondence between a neural network (NN) and a statistical mechanical spin model where neurons are mapped to Ising spins and weights to spin-spin couplings. The process of training an NN produces a family of spin Hamiltonians parameterized by training time. We study the magnetic phases and the melting transition temperature as training progresses. First, we prove analytically that the common initial state before training--an NN with independent random weights--maps to a layered version of the classical Sherrington-Kirkpatrick spin glass exhibiting a replica symmetry breaking. The spin-glass-to-paramagnet transition temperature is calculated. Further, we use the Thouless-Anderson-Palmer (TAP) equations--a theoretical technique to analyze the landscape of energy minima of random systems--to determine the evolution of the magnetic phases on two types of NNs (one with continuous and one with binarized activations) trained on the MNIST dataset. The two NN types give rise to similar results, showing a quick destruction of the spin glass and the appearance of a phase with a hidden order, whose melting transition temperature $T_c$ grows as a power law in training time. We also discuss the properties of the spectrum of the spin system's bond matrix in the context of rich vs. lazy learning. We suggest that this statistical mechanical view of NNs provides a useful unifying perspective on the training process, which can be viewed as selecting and strengthening a symmetry-broken state associated with the training task.

Neural Networks as Spin Models: From Glass to Hidden Order Through Training

TL;DR

This work builds a precise mapping between neural networks and multi-layer Sherrington-Kirkpatrick spin glasses by treating neurons as Ising spins and training weights as spin couplings, thereby representing training as a time-parametrized family of Hamiltonians. Using replica analysis, it identifies a finite-temperature spin-glass to paramagnet transition in the initial state, with a closed-form estimate for identical layers, and employs TAP equations to analyze single instances of trained systems. As training proceeds on MNIST with both binarized and ReLU networks, the critical temperature grows roughly as a power law, and the spectrum of the bond matrix develops tails that signal a transition from a glassy landscape with many TAP solutions to a single hidden-order solution aligned with the training task. The results offer a unifying, physics-based lens on NN training, link rich learning dynamics to spectral features of the weight matrix, and point to potential extensions into quantum-inspired and neuromorphic platforms.

Abstract

We explore a one-to-one correspondence between a neural network (NN) and a statistical mechanical spin model where neurons are mapped to Ising spins and weights to spin-spin couplings. The process of training an NN produces a family of spin Hamiltonians parameterized by training time. We study the magnetic phases and the melting transition temperature as training progresses. First, we prove analytically that the common initial state before training--an NN with independent random weights--maps to a layered version of the classical Sherrington-Kirkpatrick spin glass exhibiting a replica symmetry breaking. The spin-glass-to-paramagnet transition temperature is calculated. Further, we use the Thouless-Anderson-Palmer (TAP) equations--a theoretical technique to analyze the landscape of energy minima of random systems--to determine the evolution of the magnetic phases on two types of NNs (one with continuous and one with binarized activations) trained on the MNIST dataset. The two NN types give rise to similar results, showing a quick destruction of the spin glass and the appearance of a phase with a hidden order, whose melting transition temperature grows as a power law in training time. We also discuss the properties of the spectrum of the spin system's bond matrix in the context of rich vs. lazy learning. We suggest that this statistical mechanical view of NNs provides a useful unifying perspective on the training process, which can be viewed as selecting and strengthening a symmetry-broken state associated with the training task.
Paper Structure (10 sections, 25 equations, 9 figures, 1 table)

This paper contains 10 sections, 25 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: A schematic temperature vs. training epoch phase diagram for a multi-layer spin model with bonds determined by neural network training for temperatures not significantly below the melting temperature. The critical temperature grows as a power law with training time.
  • Figure 2: Schematic representation of a neural network with $L+1$ layers. The $k^\text{th}$ layer contains $N_k$ neurons. $J^{(k)}$ is the weight matrix connecting layer $k-1$ to layer $k$.
  • Figure 3: (Top) The least eigenvalue of the Hessian of the action [Eq. (\ref{['eq:Hessian']})] for the ensemble describing our neural networks before training. (Bottom) The least eigenvalue of $I_N-M$ for a particular system in that ensemble. Both vanish or nearly vanish at the same value of $\beta$.
  • Figure 4: The validation error rates as training progresses. All axes are log scaled excepting the linear scaling between 0 and 1 on the horizontal axis. The learning rate for these NNs is $10^{-4}$. Data is collected at the end of each epoch.
  • Figure 5: The least eigenvalue of $I_N-M$ as a function of inverse temperature $\beta$ before and after training the neural networks. The learning rate for these neural networks is $10^{-4}$.
  • ...and 4 more figures