Table of Contents
Fetching ...

A note on the dynamics of extended-context disordered kinetic spin models

Jacob A. Zavatone-Veth, Cengiz Pehlevan

TL;DR

This work introduces extended-context disordered kinetic spin models as analytically tractable toy autoregressive sequence models, analyzed via dynamical mean-field theory (DMFT) in the large-$N$ limit. By coupling multiple interaction matrices $\{\mathbf{J}_k\}$ with a cross-lag covariance matrix $\Gamma$, and considering Ising, Gaussian, and spherical variants, the authors derive self-consistent equations for the temporal correlations $C_{t,t'}$ and show how stationary correlations arise only when cross-lag weights are engaged. They provide explicit DMFT forms for each variant, discuss stability criteria, and develop a reverse-engineering approach to design weight correlations to realize a desired stationary correlation $c_\tau$, thereby linking model design to target temporal statistics. The framework offers a rigorous, configurable sandbox for studying learning dynamics in autoregressive sequence models and connects to broader themes in nonequilibrium disordered systems and teacher–student learning. These results advance understanding of how structure in weights across context lengths shapes long-time correlations and provide practical tools for constructing controllable synthetic data with prescribed temporal dependencies.

Abstract

Inspired by striking advances in language modeling, there has recently been much interest in developing autogressive sequence models that are amenable to analytical study. In this short note, we consider extensions of simple disordered kinetic glass models from statistical physics. These models have tunable correlations, are easy to sample, and can be solved exactly when the state space dimension is large. In particular, we give an expository derivation of the dynamical mean field theories that describe their asymptotic statistics. We therefore propose that they constitute an interesting set of toy models for autoregressive sequence generation, in which one might study learning dynamics.

A note on the dynamics of extended-context disordered kinetic spin models

TL;DR

This work introduces extended-context disordered kinetic spin models as analytically tractable toy autoregressive sequence models, analyzed via dynamical mean-field theory (DMFT) in the large- limit. By coupling multiple interaction matrices with a cross-lag covariance matrix , and considering Ising, Gaussian, and spherical variants, the authors derive self-consistent equations for the temporal correlations and show how stationary correlations arise only when cross-lag weights are engaged. They provide explicit DMFT forms for each variant, discuss stability criteria, and develop a reverse-engineering approach to design weight correlations to realize a desired stationary correlation , thereby linking model design to target temporal statistics. The framework offers a rigorous, configurable sandbox for studying learning dynamics in autoregressive sequence models and connects to broader themes in nonequilibrium disordered systems and teacher–student learning. These results advance understanding of how structure in weights across context lengths shapes long-time correlations and provide practical tools for constructing controllable synthetic data with prescribed temporal dependencies.

Abstract

Inspired by striking advances in language modeling, there has recently been much interest in developing autogressive sequence models that are amenable to analytical study. In this short note, we consider extensions of simple disordered kinetic glass models from statistical physics. These models have tunable correlations, are easy to sample, and can be solved exactly when the state space dimension is large. In particular, we give an expository derivation of the dynamical mean field theories that describe their asymptotic statistics. We therefore propose that they constitute an interesting set of toy models for autoregressive sequence generation, in which one might study learning dynamics.

Paper Structure

This paper contains 28 sections, 193 equations, 5 figures.

Figures (5)

  • Figure 1: Simulation of an Ising-like model with $\Gamma_{k,k'} = \delta_{k,k'}$ and $K=25$, with $N=5000$. a. State $\mathbf{s}_{t}$ over time. Times before $t=0$ represent the initial condition, which is chosen arbitrarily. b. Slices through the DMFT autocorrelation function $C_{t,t-k}$ across time for varying lags $k$, showing that the DMFT accurately predicts the empirically-measured correlation from a single simulation. The expectation in the DMFT equations is numerically evaluated using 50-point Gauss-Hermite quadrature. We see that autocorrelations at all non-zero lags decay over time. c. The DMFT autocorrelation function $C_{t,t'}$ from a single numerical simulation for which the slices are shown at top right. d. The corresponding DMFT prediction for the autocorrelation function.
  • Figure 2: As in Figure \ref{['fig:ising_uncorrelated']}, but for an Ising-type model with correlated weights $\Gamma_{k,k'} = \delta_{k,k'} + r (1-\delta_{k,k'}) (-1)^{k+k'}$ for $r = 0.1$. Unlike for uncorrelated wights in Figure \ref{['fig:ising_uncorrelated']}, the autocorrelation does not decay to zero over time.
  • Figure 3: Simulations of a Gaussian model with $\Gamma_{k,k'} = \delta_{k,k'} + r (1-\delta_{k,k'}) (-1)^{k+k'}$ for $r = 0.1$ and $K=4$. Here, $\beta = 0.5$. The top row shows heatmaps of the normalized empirical (a) and DMFT (b) correlation functions $C_{t,s}/\sqrt{C(t,t)C(s,s)}$. The bottom row shows the exponential growth of $C(t,t)$ (c) and slices through the normalized correlation functions (d).
  • Figure 4: Simulations of a spherical model with $\Gamma_{k,k'} = \delta_{k,k'} + r (1-\delta_{k,k'})$ for $r = 0.25$ and $K=25$. Here, $\beta = 1$. The top row shows heatmaps of the empirical (a) and DMFT (b) correlation functions $C_{t,s}$. The bottom row shows slices through the correlation functions (c).
  • Figure 5: Simulations of a spherical model with $\Gamma_{k,k'} = \delta_{k,k'} + r (1-\delta_{k,k'}) (-1)^{k+k'}$ for $r = 0.25$ and $K=25$. Here, $\beta = 1$. The top row shows heatmaps of the empirical (a) and DMFT (b) correlation functions $C_{t,s}$. The bottom row shows slices through the correlation functions (c).