Phase Transitions in Large Language Models and the $O(N)$ Model

Youran Sun; Babak Haghighat

Phase Transitions in Large Language Models and the $O(N)$ Model

Youran Sun, Babak Haghighat

TL;DR

This work links large language model scaling to phase-transition physics by recasting the Transformer as an $O(N)$ model. It identifies two distinct transitions: a temperature-driven transition revealing an internal dimension and a higher-depth transition tied to model size, indicating emergent capabilities beyond a critical parameter count around $P_c \approx 7$B. The energy of the $O(N)$ formulation, defined as $E=\frac{1}{L}\sum_{\sigma,\tau} t_\sigma \cdot t_\tau$, serves as a fast indicator of training status and data sufficiency. The results support a renormalization-group–style flow from human to machine language and suggest that large models occupy a different regime from small models, with practical implications for scaling decisions and diagnostic tools.

Abstract

Large language models (LLMs) exhibit unprecedentedly rich scaling behaviors. In physics, scaling behavior is closely related to phase transitions, critical phenomena, and field theory. To investigate the phase transition phenomena in LLMs, we reformulated the Transformer architecture as an $O(N)$ model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the $O(N)$ model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.

Phase Transitions in Large Language Models and the $O(N)$ Model

TL;DR

This work links large language model scaling to phase-transition physics by recasting the Transformer as an

model. It identifies two distinct transitions: a temperature-driven transition revealing an internal dimension and a higher-depth transition tied to model size, indicating emergent capabilities beyond a critical parameter count around

B. The energy of the

formulation, defined as

, serves as a fast indicator of training status and data sufficiency. The results support a renormalization-group–style flow from human to machine language and suggest that large models occupy a different regime from small models, with practical implications for scaling decisions and diagnostic tools.

Abstract

model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the

model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.

Phase Transitions in Large Language Models and the $O(N)$ Model

TL;DR

Abstract

Phase Transitions in Large Language Models and the $O(N)$ Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)