States of LLM-generated Texts and Phase Transitions between them
Nikolay Mikhaylovskiy
TL;DR
Understanding how generation temperature shapes long-range statistical structure in LLM outputs, the paper adopts an empirical phase-analysis that maps autocorrelation behavior to solid/critical/gas-like states. It analyzes two transformer models (Qwen and Phi) across multiple temperatures, using distributional semantics (GloVe) to compute vector-based autocorrelations and Fourier analysis to detect phase boundaries. The key findings include a phase transition near $T \approx 0.8$ from periodic (solid) to amorphous (gas), exponential decay of long-range correlations in the amorphous phase, and mid-range power-law decay (up to roughly 2000 words) in the critical regime, suggesting islands of connectivity. This phase-centric view points to a potential universality class for transformer-based LLMs and offers a framework to study how architecture and scale influence text structure.
Abstract
It is known for some time that autocorrelations of words in human-written texts decay according to a power law. Recent works have also shown that the autocorrelations decay in texts generated by LLMs is qualitatively different from the literary texts. Solid state physics tie the autocorrelations decay laws to the states of matter. In this work, we empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid, critical state or gas.
