Lines of Thought in Large Language Models
Raphaël Sarfati, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls
TL;DR
This paper treats large language models as dynamical systems, studying how embedded prompts traverse latent space through transformer layers as lines of thought (LoT). It shows that independent LoT ensembles cluster along a non-Euclidean, low-dimensional manifold and can be described by a stochastic model with a small number of parameters, extending to continuous time via Langevin dynamics and a Fokker-Planck formulation. The authors propose a discrete-time linear update with rotation and stretch, plus a Gaussian residual, and generalize it to continuous time; they validate the approach with GPT-2 and other models (Llama 2, Mistral, Llama 3.2), revealing both robust transport patterns and last-layer anomalies under reinitialization or fine-tuning. These results offer a compact, probabilistic description of high-dimensional transformer computations, with potential implications for interpretability, model diagnostics, and future hybrid architectures that separate deterministic transport from stochastic, meaning-bearing variability.
Abstract
Large Language Models achieve next-token prediction by transporting a vectorized piece of text (prompt) across an accompanying embedding space under the action of successive transformer layers. The resulting high-dimensional trajectories realize different contextualization, or 'thinking', steps, and fully determine the output probability distribution. We aim to characterize the statistical properties of ensembles of these 'lines of thought.' We observe that independent trajectories cluster along a low-dimensional, non-Euclidean manifold, and that their path can be well approximated by a stochastic equation with few parameters extracted from data. We find it remarkable that the vast complexity of such large models can be reduced to a much simpler form, and we reflect on implications.
