Phase Transitions in Large Language Models and the $O(N)$ Model
Youran Sun, Babak Haghighat
TL;DR
This work links large language model scaling to phase-transition physics by recasting the Transformer as an $O(N)$ model. It identifies two distinct transitions: a temperature-driven transition revealing an internal dimension and a higher-depth transition tied to model size, indicating emergent capabilities beyond a critical parameter count around $P_c \approx 7$B. The energy of the $O(N)$ formulation, defined as $E=\frac{1}{L}\sum_{\sigma,\tau} t_\sigma \cdot t_\tau$, serves as a fast indicator of training status and data sufficiency. The results support a renormalization-group–style flow from human to machine language and suggest that large models occupy a different regime from small models, with practical implications for scaling decisions and diagnostic tools.
Abstract
Large language models (LLMs) exhibit unprecedentedly rich scaling behaviors. In physics, scaling behavior is closely related to phase transitions, critical phenomena, and field theory. To investigate the phase transition phenomena in LLMs, we reformulated the Transformer architecture as an $O(N)$ model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the $O(N)$ model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.
