Table of Contents
Fetching ...

States of LLM-generated Texts and Phase Transitions between them

Nikolay Mikhaylovskiy

TL;DR

Understanding how generation temperature shapes long-range statistical structure in LLM outputs, the paper adopts an empirical phase-analysis that maps autocorrelation behavior to solid/critical/gas-like states. It analyzes two transformer models (Qwen and Phi) across multiple temperatures, using distributional semantics (GloVe) to compute vector-based autocorrelations and Fourier analysis to detect phase boundaries. The key findings include a phase transition near $T \approx 0.8$ from periodic (solid) to amorphous (gas), exponential decay of long-range correlations in the amorphous phase, and mid-range power-law decay (up to roughly 2000 words) in the critical regime, suggesting islands of connectivity. This phase-centric view points to a potential universality class for transformer-based LLMs and offers a framework to study how architecture and scale influence text structure.

Abstract

It is known for some time that autocorrelations of words in human-written texts decay according to a power law. Recent works have also shown that the autocorrelations decay in texts generated by LLMs is qualitatively different from the literary texts. Solid state physics tie the autocorrelations decay laws to the states of matter. In this work, we empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid, critical state or gas.

States of LLM-generated Texts and Phase Transitions between them

TL;DR

Understanding how generation temperature shapes long-range statistical structure in LLM outputs, the paper adopts an empirical phase-analysis that maps autocorrelation behavior to solid/critical/gas-like states. It analyzes two transformer models (Qwen and Phi) across multiple temperatures, using distributional semantics (GloVe) to compute vector-based autocorrelations and Fourier analysis to detect phase boundaries. The key findings include a phase transition near from periodic (solid) to amorphous (gas), exponential decay of long-range correlations in the amorphous phase, and mid-range power-law decay (up to roughly 2000 words) in the critical regime, suggesting islands of connectivity. This phase-centric view points to a potential universality class for transformer-based LLMs and offers a framework to study how architecture and scale influence text structure.

Abstract

It is known for some time that autocorrelations of words in human-written texts decay according to a power law. Recent works have also shown that the autocorrelations decay in texts generated by LLMs is qualitatively different from the literary texts. Solid state physics tie the autocorrelations decay laws to the states of matter. In this work, we empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid, critical state or gas.

Paper Structure

This paper contains 13 sections, 6 equations, 13 figures.

Figures (13)

  • Figure 1: Degenerative Text Generated by Qwen at t=0.1, shift 11904, seed 1
  • Figure 2: Nonsense Text Generated by Phi at t=2.8, shift 539, seed 1
  • Figure 3: Gibberish Text Generated by Phi at t=2.8, shift 9794, seed 1
  • Figure 4: Text Generated by Phi at t=1.0, shift 9816, seed 1
  • Figure 5: Autocorrelation Function of the Text Generated by Phi at $t = 0.4$ and $seed = 2$ and Its FFT
  • ...and 8 more figures