How to Tame Your LLM: Semantic Collapse in Continuous Systems
C. M. Wyss
TL;DR
The paper proposes the Semantic Characterization Theorem (SCT), showing that LLMs operating in a continuous latent space exhibit discrete symbolic semantics via spectral lumpability and o-minimal definability. By modeling LLMs as Continuous State Machines with a transfer operator P, it proves that a finite set of dominant eigenfunctions induces semantic basins that align with definable, low-complexity cells. The two-pronged argument—spectral analysis and logical tameness—demonstrates that discrete semantics emerge from continuous computation and that the resulting partitions are equivalent up to measure-zero boundaries. Empirically, diffusion-based experiments on sentence embeddings reveal a triad of dominant semantic dimensions, defensible basins, and an ontological skeleton, supporting the SCT and suggesting practical avenues for prompting and interpretable AI design.
Abstract
We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,μ) \to L^2(M,μ)$ encodes the propagation of semantic mass. Under mild regularity assumptions (compactness, ergodicity, bounded Jacobian), $P$ is compact with discrete spectrum. Within this setting, we prove the Semantic Characterization Theorem (SCT): the leading eigenfunctions of $P$ induce finitely many spectral basins of invariant meaning, each definable in an o-minimal structure over $\mathbb{R}$. Thus spectral lumpability and logical tameness coincide. This explains how discrete symbolic semantics can emerge from continuous computation: the continuous activation manifold collapses into a finite, logically interpretable ontology. We further extend the SCT to stochastic and adiabatic (time-inhomogeneous) settings, showing that slowly drifting kernels preserve compactness, spectral coherence, and basin structure.
