Table of Contents
Fetching ...

Phase Transitions in Large Language Models and the $O(N)$ Model

Youran Sun, Babak Haghighat

TL;DR

This work links large language model scaling to phase-transition physics by recasting the Transformer as an $O(N)$ model. It identifies two distinct transitions: a temperature-driven transition revealing an internal dimension and a higher-depth transition tied to model size, indicating emergent capabilities beyond a critical parameter count around $P_c \approx 7$B. The energy of the $O(N)$ formulation, defined as $E=\frac{1}{L}\sum_{\sigma,\tau} t_\sigma \cdot t_\tau$, serves as a fast indicator of training status and data sufficiency. The results support a renormalization-group–style flow from human to machine language and suggest that large models occupy a different regime from small models, with practical implications for scaling decisions and diagnostic tools.

Abstract

Large language models (LLMs) exhibit unprecedentedly rich scaling behaviors. In physics, scaling behavior is closely related to phase transitions, critical phenomena, and field theory. To investigate the phase transition phenomena in LLMs, we reformulated the Transformer architecture as an $O(N)$ model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the $O(N)$ model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.

Phase Transitions in Large Language Models and the $O(N)$ Model

TL;DR

This work links large language model scaling to phase-transition physics by recasting the Transformer as an model. It identifies two distinct transitions: a temperature-driven transition revealing an internal dimension and a higher-depth transition tied to model size, indicating emergent capabilities beyond a critical parameter count around B. The energy of the formulation, defined as , serves as a fast indicator of training status and data sufficiency. The results support a renormalization-group–style flow from human to machine language and suggest that large models occupy a different regime from small models, with practical implications for scaling decisions and diagnostic tools.

Abstract

Large language models (LLMs) exhibit unprecedentedly rich scaling behaviors. In physics, scaling behavior is closely related to phase transitions, critical phenomena, and field theory. To investigate the phase transition phenomena in LLMs, we reformulated the Transformer architecture as an model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.

Paper Structure

This paper contains 18 sections, 23 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Demonstartion of how to reformulate Transformer as an $O(N)$ model.
  • Figure 2: Energy of Qwen2.5 models. The temperature is used in generating text. Higher temperatures result in more random generation. The energy is computed by Eq. \ref{['eq:defE']}.
  • Figure 3: Energy-temperature curve of small LLMs. This figure shows the energy of Qwen2.5-0.5B.
  • Figure 4: Energy-temperature curve of large LLMs. This figure shows the energy of Qwen2.5-32B.
  • Figure 5: Energy of Qwen2.5-Math and Qwen2.5-Coder models
  • ...and 1 more figures