Table of Contents
Fetching ...

Phase transition on a context-sensitive random language model with short range interactions

Yuma Toji, Jun Takahashi, Vwani Roychowdhury, Hideyuki Miyahara

Abstract

Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.

Phase transition on a context-sensitive random language model with short range interactions

Abstract

Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.

Paper Structure

This paper contains 29 sections, 12 equations, 49 figures, 2 tables.

Figures (49)

  • Figure 1: Ranking of symbol appearance frequencies for (upper) $K=100$ and (lower) $K=500$, plotted on log--log scales. The critical temperature in both cases is $k_{\mathrm{B}} T \simeq 0.42$. We set $J = 1.0$, $q = 10^{-1.0}$, $t = 0.0$, and $\epsilon = 0.00$. The curves for different $k_{\mathrm{B}} T$ are overlaid.
  • Figure 2: A schematic diagram of the vectors $\bm{e}_{k}$ for $K=2,3,$ and $4$. They sum to the zero vector, and the inner products of any two different vectors are constant.
  • Figure 3: Temperature dependence of (upper) the magnetization, Eq. \ref{['main_eq_magnetization_001_001']}, (middle) the susceptibility, Eq. \ref{['main_eq_susceptibility_001_001']}, and (lower) the Binder parameter, Eq. \ref{['main_eq_Binder_parameter_001_001']}. We set $K = 20, J = 1.0, q = 10^{-2.0}, t = 0.0$, and $\epsilon = 0.00$. We show the results for various system sizes $N = 16, 32, \dots, 4096$, and the curves for different $N$ are overlaid in each panel.
  • Figure 4: The correlation function, Eq. \ref{['main_eq_correlation_function_001_002']}, with $i = \lfloor N/4 \rfloor$ and $j = \lfloor 3N/4 \rfloor - 1$. We set $K = 20, J = 1.0, q = 10^{-2.0}, t = 0.0$, and $\epsilon = 0.00$. We show the results for various $k_{\mathrm{B}} T = 0.1, 0.2, \dots, 2.0$, and the curves for different $k_{\mathrm{B}} T$ are overlaid.
  • Figure 5: Finite-size scaling of $\tilde{\chi}$ with $T_{\mathrm{c}} = 0.24, \nu = 2.50$, and $\gamma = 2.00$. We set $K = 20, J = 1.0, q = 10^{-2.0}, t = 0.0$, and $\epsilon = 0.00$. We varied $N = 64, 128, \dots, 4096$.
  • ...and 44 more figures