How to Tame Your LLM: Semantic Collapse in Continuous Systems

C. M. Wyss

How to Tame Your LLM: Semantic Collapse in Continuous Systems

C. M. Wyss

TL;DR

The paper proposes the Semantic Characterization Theorem (SCT), showing that LLMs operating in a continuous latent space exhibit discrete symbolic semantics via spectral lumpability and o-minimal definability. By modeling LLMs as Continuous State Machines with a transfer operator P, it proves that a finite set of dominant eigenfunctions induces semantic basins that align with definable, low-complexity cells. The two-pronged argument—spectral analysis and logical tameness—demonstrates that discrete semantics emerge from continuous computation and that the resulting partitions are equivalent up to measure-zero boundaries. Empirically, diffusion-based experiments on sentence embeddings reveal a triad of dominant semantic dimensions, defensible basins, and an ontological skeleton, supporting the SCT and suggesting practical avenues for prompting and interpretable AI design.

Abstract

We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,μ) \to L^2(M,μ)$ encodes the propagation of semantic mass. Under mild regularity assumptions (compactness, ergodicity, bounded Jacobian), $P$ is compact with discrete spectrum. Within this setting, we prove the Semantic Characterization Theorem (SCT): the leading eigenfunctions of $P$ induce finitely many spectral basins of invariant meaning, each definable in an o-minimal structure over $\mathbb{R}$. Thus spectral lumpability and logical tameness coincide. This explains how discrete symbolic semantics can emerge from continuous computation: the continuous activation manifold collapses into a finite, logically interpretable ontology. We further extend the SCT to stochastic and adiabatic (time-inhomogeneous) settings, showing that slowly drifting kernels preserve compactness, spectral coherence, and basin structure.

How to Tame Your LLM: Semantic Collapse in Continuous Systems

TL;DR

Abstract

How to Tame Your LLM: Semantic Collapse in Continuous Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (44)