Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, Iryna Gurevych
TL;DR
SMDS is introduced as a supervised extension of multidimensional scaling that identifies geometry-aligned subspaces in language model activations, enabling systematic discovery of feature manifolds. Through a temporal-reasoning case study, the authors show that temporal concepts organize into interpretable structures (linear, circular, clusters) that persist across model families and adapt with task context. They validate that these manifolds are causally involved in reasoning by perturbing manifold-aligned subspaces and correlating manifold quality with downstream performance, and they demonstrate generalization to multidimensional manifolds and geography, suggesting a broader role for entity-based, geometry-driven reasoning in LMs. The work positions manifold geometry as a core component of mechanistic interpretability and offers a scalable diagnostic and analysis tool for probing how representations support reasoning in modern language models.
Abstract
The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior efforts focus on discovering specific geometries for specific features, and thus lack generalization. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method to automatically discover feature manifolds. We apply SMDS to temporal reasoning as a case study, finding that different features form various geometric structures such as circles, lines, and clusters. SMDS reveals many insights on these structures: they consistently reflect the properties of the concepts they represent; are stable across model families and sizes; actively support reasoning in models; and dynamically reshape in response to context changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.
