Table of Contents
Fetching ...

Critical Phase Transition in Large Language Models

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

TL;DR

This work has conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter, suggesting a meaningful analogy between LLMs and natural phenomena.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance. To understand their behaviors, we need to consider the fact that LLMs sometimes show qualitative changes. The natural world also presents such changes called phase transitions, which are defined by singular, divergent statistical quantities. Therefore, an intriguing question is whether qualitative changes in LLMs are phase transitions. In this work, we have conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter. Specifically, statistical quantities have divergent properties just at the point between the low-temperature regime, where LLMs generate sentences with clear repetitive structures, and the high-temperature regime, where generated sentences are often incomprehensible. In addition, critical behaviors near the phase transition point, such as a power-law decay of correlation and slow convergence toward the stationary state, are similar to those in natural languages. Our results suggest a meaningful analogy between LLMs and natural phenomena.

Critical Phase Transition in Large Language Models

TL;DR

This work has conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter, suggesting a meaningful analogy between LLMs and natural phenomena.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance. To understand their behaviors, we need to consider the fact that LLMs sometimes show qualitative changes. The natural world also presents such changes called phase transitions, which are defined by singular, divergent statistical quantities. Therefore, an intriguing question is whether qualitative changes in LLMs are phase transitions. In this work, we have conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter. Specifically, statistical quantities have divergent properties just at the point between the low-temperature regime, where LLMs generate sentences with clear repetitive structures, and the high-temperature regime, where generated sentences are often incomprehensible. In addition, critical behaviors near the phase transition point, such as a power-law decay of correlation and slow convergence toward the stationary state, are similar to those in natural languages. Our results suggest a meaningful analogy between LLMs and natural phenomena.
Paper Structure (20 sections, 3 equations, 30 figures)

This paper contains 20 sections, 3 equations, 30 figures.

Figures (30)

  • Figure 1: Schematic pictures of phase transitions and critical phenomena in physics and LLMs: (A) Phase transition in a ferromagnetic Ising model, showing how susceptibility exhibits a singularity as a function of temperature in the infinite system size limit. This singular point, which is called the phase transition point, separates the parameter space into ordered and disordered phases. (B) A conjectured relation between LLMs and natural languages within a parameter space where each element represents a distribution of sequences.
  • Figure 2: Correlation $C(t, t+\Delta t) = C_{\text{PROPN},\text{PROPN}}(t, t+\Delta t)$ at (A) $T=0.3$, (B) $T=1$, and (C) $T=1.7$ as a function of time interval $\Delta t$, where the sequence length is $N = 512$. Points where the correlation becomes zero have been omitted from the plot to avoid divergences in the logarithmic scale and to focus on significant correlations.
  • Figure 3: (A) Integrated correlation $\tau = \tau_{\text{PROPN}, {\text{PROPN}}}$ as a function of temperature $T$ for various sequence lengths $N$. (B) The same quantity as a function of sequence length $N$ for various temperatures $T$. The black line represents a line proportional to $N$.
  • Figure 4: Power spectrum $S = S_{\text{PROPN}}$ of POS sequences as a function of $\omega$ at (A) $T=0.3$, (B) $T=1$, and (C) $T=1.7$. At $T=0.3$, $S(\omega)$ has many peaks. These peaks disappear at around $T=1$. At $T = 1.7$, $S(\omega)$ is featureless.
  • Figure 5: Probability $v(t) = v_{\text{PROPN}}(t)$ that the $t$-th tag is PROPN as a function of time $t$ at $T=0.1$, $0.5$, $0.9$, $1$, $1.1$, $1.2$, $1.5$, and $2$, where the sequence length is $N=512$. At $T = 0.1$, $0.5$, $1.5$ and $2$, $v(t)$ rapidly reaches the limiting value at $t \lesssim 100$, whereas it needs a much longer time scale to converge when $T = 0.9$, $1.0$, $1.1$, and $1.2$.
  • ...and 25 more figures