Table of Contents
Fetching ...

Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Satadeep Bhattacharjee, Seung-Cheol Lee

TL;DR

The paper tests a physics-inspired spin-bath view of self-attention by extracting GPT-2 Query–Key weights to build head-specific Hamiltonians, deriving logit-gap phase boundaries, and validating them across 144 heads and 20 prompts. It provides quantitative evidence that per-head two-body interactions can predict next-token preferences, most notably via antagonist Head L3H5, and demonstrates causality through head ablations. The authors extend the framework to propose a three-body extension and then pivot to spin-dynamical generative modeling, introducing attention-guided diffusion on the sphere with an analytic drift from context fields and a learned residual. This work bridges condensed-matter physics and NLP, offering a principled interpretability toolkit, potential interventions for bias and robustness, and a novel diffusion paradigm grounded in LLG-like dynamics on manifolds.

Abstract

The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.

Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

TL;DR

The paper tests a physics-inspired spin-bath view of self-attention by extracting GPT-2 Query–Key weights to build head-specific Hamiltonians, deriving logit-gap phase boundaries, and validating them across 144 heads and 20 prompts. It provides quantitative evidence that per-head two-body interactions can predict next-token preferences, most notably via antagonist Head L3H5, and demonstrates causality through head ablations. The authors extend the framework to propose a three-body extension and then pivot to spin-dynamical generative modeling, introducing attention-guided diffusion on the sphere with an analytic drift from context fields and a learned residual. This work bridges condensed-matter physics and NLP, offering a principled interpretability toolkit, potential interventions for bias and robustness, and a novel diffusion paradigm grounded in LLG-like dynamics on manifolds.

Abstract

The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings (, ).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.

Paper Structure

This paper contains 32 sections, 52 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Statistical Validation of Head L3H5: The theoretical logit difference from head L3H5 ($\Delta L_{\text{theory}}$) is plotted against the final model's actual logit difference ($\Delta L_{\text{actual}}$) for 20 prompts. The strong negative correlation ($r^2=0.478, p<0.001$) demonstrates that this head has a significant and consistently antagonistic function for this task.
  • Figure 2: Decision Landscape of Head 5 in Layer 3 Prompt: “The capital of France is Paris. The capital of Germany is…”
  • Figure 3: Order Parameters vs. Global Generation Temperature for GPT-2
  • Figure 4: Causal effect of Head ablation: The bar chart shows the change in the model's output logit difference for a challenging prompt. Ablating the statistically identified antagonistic head L3H5 has a negligible effect. In contrast, ablating a control head, L0H0, significantly worsens the model's output, revealing L0H0's crucial but context-specific positive role.
  • Figure 5: Workflow used to test the spin-bath model of attention on a pre-trained GPT-2 transformer.
  • ...and 2 more figures