Probing the topology of the space of tokens with structured prompts
Michael Robinson, Sourya Dey, Taisa Kushner
TL;DR
The paper tackles recovering the hidden token input embedding subspace $T$ inside the latent space $X$ of an LLM and links its topology to model behavior. It introduces a general structured prompting method and proves via Theorem that, under transversality conditions for the autoregressive map and a measurement map, the collected data embeds $T$ up to a homeomorphism. Empirical results on Llemma-7B show a stratified token subspace with a base dimension around 5–10 and a high-dimensional fiber, and a low-dimensional embedding into $\mathbb{R}^{90}$ can preserve the topology. The approach generalizes to nonlinear autoregressive processes, providing a principled topological lens to analyze and interpret black-box sequence models.
Abstract
This article presents a general and flexible method for prompting a large language model (LLM) to reveal its (hidden) token input embedding up to homeomorphism. Moreover, this article provides strong theoretical justification -- a mathematical proof for generic LLMs -- for why this method should be expected to work. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of Llemma-7B. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.
