Table of Contents
Fetching ...

Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model

Yuma Toji, Jun Takahashi, Vwani Roychowdhury, Hideyuki Miyahara

TL;DR

This work asks whether language-generating systems can exhibit genuine phase-transition behavior. It introduces a context-sensitive random language model (CSG) that combines growth rules with long-range, context-dependent interactions inspired by the one-dimensional long-range Potts model, and analyzes ordering via an order parameter $M$, susceptibility, and Binder parameter. The authors demonstrate a Berezinskii--Kosterlitz--Thouless (BKT)–type transition with an extended critical phase, evidenced by power-law correlations and finite-size scaling yielding $T_c$, $\nu$, and $\gamma$ in representative parameter regimes; the transition’s presence depends on growth and alphabet size. The findings suggest that robust scaling laws in natural languages and modern language models may arise from intrinsic grammatical and long-range coherence mechanisms rather than fine-tuning, offering a thermodynamic lens on language structure and emergent capabilities. This minimal, analysable framework opens avenues to connect linguistic growth, attention-like long-range interactions, and critical phenomena in NLP systems.

Abstract

Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades. The recent rise of large language models has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities. However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking. In this work, inspired by the one-dimensional Potts model in statistical physics, we construct a simple probabilistic language model that falls under the class of context-sensitive grammars, which we call the context-sensitive random language model, and numerically demonstrate an unambiguous phase transition in the framework of a natural language model. We explicitly show that a precisely defined order parameter -- that captures symbol frequency biases in the sentences generated by the language model -- changes from strictly zero to a strictly nonzero value (in the infinite-length limit of sentences), implying a mathematical singularity arising when tuning the parameter of the stochastic language model we consider. Furthermore, we identify the phase transition as a variant of the Berezinskii--Kosterlitz--Thouless (BKT) transition, which is known to exhibit critical properties not only at the transition point but also in the entire phase. This finding leads to the possibility that critical properties in natural languages may not require careful fine-tuning nor self-organized criticality, but are generically explained by the underlying connection between language structures and the BKT phases.

Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model

TL;DR

This work asks whether language-generating systems can exhibit genuine phase-transition behavior. It introduces a context-sensitive random language model (CSG) that combines growth rules with long-range, context-dependent interactions inspired by the one-dimensional long-range Potts model, and analyzes ordering via an order parameter , susceptibility, and Binder parameter. The authors demonstrate a Berezinskii--Kosterlitz--Thouless (BKT)–type transition with an extended critical phase, evidenced by power-law correlations and finite-size scaling yielding , , and in representative parameter regimes; the transition’s presence depends on growth and alphabet size. The findings suggest that robust scaling laws in natural languages and modern language models may arise from intrinsic grammatical and long-range coherence mechanisms rather than fine-tuning, offering a thermodynamic lens on language structure and emergent capabilities. This minimal, analysable framework opens avenues to connect linguistic growth, attention-like long-range interactions, and critical phenomena in NLP systems.

Abstract

Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades. The recent rise of large language models has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities. However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking. In this work, inspired by the one-dimensional Potts model in statistical physics, we construct a simple probabilistic language model that falls under the class of context-sensitive grammars, which we call the context-sensitive random language model, and numerically demonstrate an unambiguous phase transition in the framework of a natural language model. We explicitly show that a precisely defined order parameter -- that captures symbol frequency biases in the sentences generated by the language model -- changes from strictly zero to a strictly nonzero value (in the infinite-length limit of sentences), implying a mathematical singularity arising when tuning the parameter of the stochastic language model we consider. Furthermore, we identify the phase transition as a variant of the Berezinskii--Kosterlitz--Thouless (BKT) transition, which is known to exhibit critical properties not only at the transition point but also in the entire phase. This finding leads to the possibility that critical properties in natural languages may not require careful fine-tuning nor self-organized criticality, but are generically explained by the underlying connection between language structures and the BKT phases.

Paper Structure

This paper contains 48 sections, 39 equations, 68 figures, 1 algorithm.

Figures (68)

  • Figure 1: (a) Schematic of a CFG. This diagram shows the syntactic structure of "The boy with blue pants found a cat." Natural languages are, however, context-sensitive and in this paper we define a simple CSG that demonstrates a phase transition. (b) We consider the relative frequencies of letters or symbols in the alphabet---as captured by the net magnetization of the corresponding physical system---as an order parameter in our CSG. The Binder parameter, which is the normalized kurtosis of the underlying order parameter [see Eq. \ref{['main_eq_def_Binder_parameter_001_001']}], can capture phase transitions effectively, especially for one-dimensional models. A schematic of the Binder parameter is shown for the standard second-order phase transition (upper) and the BKT transition (lower). For the BKT transition the Binder parameter's dependence on temperature---which determines the probability with which a rule is applied in the underlying CSG---does not become a step function in the thermodynamic limit. (c) Correlation functions for the disordered, critical, and ordered phases. When a system is critical, the correlation function shows a polynomial decay.
  • Figure 2: Typical configurations of symbols for $X \to YZ$. We plot (upper) states whose magnetization lies between $0.9000$ and $0.9200$ at $k_\mathrm{B} T = 0.760$, and (lower) states whose magnetization lies between $0.0000$ and $0.0200$ at $k_\mathrm{B} T = 1.160$. We set $K = 2$, $J = 1.0$, $q = 10^{-2.0}$, $t = 0$, $s = 0.9$, and $r_- = r_+ = 0.25$. We plot the first 256 symbols out of 4096. Note that the estimated critical temperature is $k_\mathrm{B} T_* = 0.960$.
  • Figure 3: Temperature dependence of (upper) the magnetization, Eq. \ref{['main_eq_def_magnetization_001_001']}, (middle) the susceptibility, Eq. \ref{['main_eq_def_specific_heat_001_001']}, and (lower) the Binder parameter, Eq. \ref{['main_eq_def_Binder_parameter_001_001']}, for $X \to YZ$. We set $q = 10^{-2.0}$. We also set $K = 2$, $J = 1.0$, $t = 0$, $s = 0.9$, and $r_- = r_+ = 0.25$. The length of the generated sentence by the language model, $N$, was varied from $16$ to $4096$.
  • Figure 4: Correlation functions, Eq. \ref{['main_eq_correlation_function_with_disconnected_diagram_Potts_001_002']}, (upper) with $i=2048$ and $j=2048 + \Delta i$ and (lower) with $i = 0$ and $j = \lfloor N / 4 \rfloor - 1$ for $X \to YZ$. We set $K = 2$, $J = 1.0$, $q = 10^{-2.0}$, $t = 0$, $s = 0.9$, and $r_- = r_+ = 0.25$. Temperature $k_\mathrm{B} T$ was varied from $0.1$ to $2.0$. The dashed lines are fitted lines to the data for (upper) $\Delta i \in [10, 1000]$ and (lower) $N \in [100, 1000]$ at $k_\mathrm{B} T = 1.1$.
  • Figure 5: (upper) Phase diagram of the context-sensitive random language model, where the horizontal and vertical axes are the growth rate of a sentence $q$ and critical temperature $k_\mathrm{B} T$, respectively. We consider $X \to YZ$ and set $K = 2$, $J = 1.0$, $t = 0$, $s = 0.9$, and $r_- = r_+ = 0.25$. (middle) Finite-size scaling of $\tilde{\chi}$ at $q = 10^{-2.0}$. We set $T_\mathrm{c} = 0.960$, $\nu = 2.6250$, and $\gamma = 2.05$, where the values of $\nu$ and $\gamma$ are determined such that the scaling assumption holds. We varied $N = 64, 128, 256, 512, 1024, 2048, 4096$. (lower) $q$-dependence of the critical exponents $\nu$ and $\gamma$.
  • ...and 63 more figures