Attention to Order: Transformers Discover Phase Transitions via Learnability
Şener Özönder
TL;DR
The paper addresses identifying phase transitions in systems lacking analytic solutions by proposing learnability as a universal criterion. It trains an encoder-only transformer with a masked-language-modeling objective on Monte Carlo-generated 2D Ising configurations using a 2D positional encoding. Ordered phases exhibit enhanced learnability, evidenced by lower final training loss and structured attention patterns, while disordered phases resist learning. Two unsupervised diagnostics—the sharp jump in final training loss as temperature crosses the transition and the abrupt rise in attention-entropy across temperatures—recover the critical temperature, $T_c \approx 2.27$, in excellent agreement with the exact value $T_c$. The results establish learnability as a data-driven marker of phase transitions and reveal deep parallels between long-range order in condensed matter and structure emergence in large language models, with broad applicability to frustrated or disordered systems.
Abstract
Phase transitions mark qualitative reorganizations of collective behavior, yet identifying their boundaries remains challenging whenever analytic solutions are absent and conventional simulations fail. Here we introduce learnability as a universal criterion, defined as the ability of a transformer model containing attention mechanism to extract structure from microscopic states. Using self-supervised learning and Monte Carlo generated configurations of the two-dimensional Ising model, we show that ordered phases correspond to enhanced learnability, manifested in both reduced training loss and structured attention patterns, while disordered phases remain resistant to learning. Two unsupervised diagnostics, the sharp jump in training loss and the rise in attention entropy, recover the critical temperature in excellent agreement with the exact value. Our results establish learnability as a data-driven marker of phase transitions and highlight deep parallels between long-range order in condensed matter and the emergence of structure in modern language models.
