Table of Contents
Fetching ...

Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling

Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, Iryna Gurevych

TL;DR

SMDS is introduced as a supervised extension of multidimensional scaling that identifies geometry-aligned subspaces in language model activations, enabling systematic discovery of feature manifolds. Through a temporal-reasoning case study, the authors show that temporal concepts organize into interpretable structures (linear, circular, clusters) that persist across model families and adapt with task context. They validate that these manifolds are causally involved in reasoning by perturbing manifold-aligned subspaces and correlating manifold quality with downstream performance, and they demonstrate generalization to multidimensional manifolds and geography, suggesting a broader role for entity-based, geometry-driven reasoning in LMs. The work positions manifold geometry as a core component of mechanistic interpretability and offers a scalable diagnostic and analysis tool for probing how representations support reasoning in modern language models.

Abstract

The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior efforts focus on discovering specific geometries for specific features, and thus lack generalization. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method to automatically discover feature manifolds. We apply SMDS to temporal reasoning as a case study, finding that different features form various geometric structures such as circles, lines, and clusters. SMDS reveals many insights on these structures: they consistently reflect the properties of the concepts they represent; are stable across model families and sizes; actively support reasoning in models; and dynamically reshape in response to context changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.

Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling

TL;DR

SMDS is introduced as a supervised extension of multidimensional scaling that identifies geometry-aligned subspaces in language model activations, enabling systematic discovery of feature manifolds. Through a temporal-reasoning case study, the authors show that temporal concepts organize into interpretable structures (linear, circular, clusters) that persist across model families and adapt with task context. They validate that these manifolds are causally involved in reasoning by perturbing manifold-aligned subspaces and correlating manifold quality with downstream performance, and they demonstrate generalization to multidimensional manifolds and geography, suggesting a broader role for entity-based, geometry-driven reasoning in LMs. The work positions manifold geometry as a core component of mechanistic interpretability and offers a scalable diagnostic and analysis tool for probing how representations support reasoning in modern language models.

Abstract

The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior efforts focus on discovering specific geometries for specific features, and thus lack generalization. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method to automatically discover feature manifolds. We apply SMDS to temporal reasoning as a case study, finding that different features form various geometric structures such as circles, lines, and clusters. SMDS reveals many insights on these structures: they consistently reflect the properties of the concepts they represent; are stable across model families and sizes; actively support reasoning in models; and dynamically reshape in response to context changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.

Paper Structure

This paper contains 41 sections, 18 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Our contributions. Supervised Multi-Dimensional Scaling is a novel dimensionality reduction technique to identify subspaces with a known geometry (left). Using it, we show evidence that temporal entities in LMs form various types of feature manifolds, which are task & prompt dependent and support reasoning (right).
  • Figure 2: Feature Manifold Discovery and the Limitations of Previous Methods. (a) Prompt and task setting. (b) LDA, PCA, and PLS either fail to recover structure or order due to their limitations. (c) SMDS succeeds.
  • Figure 3: Feature manifolds retrieved from the lp site. We can observe that models represent features in a similar way, and the resulting manifolds are interpretable and match an intuitive progression (linear, circular or categorical) of the underlying features. The scatter plots on the left show the first two components of SMDS dimensionality reduction; the bar plots on the right depict scoring of different manifolds on the given activations. Scores displayed are computed as $-\log \textit{S}$ to emphasise the difference between values; error bars are shown in black. Bar plot colour reflects manifold topology: (0.2ex,0.2ex) linear; (0.2ex,0.2ex) cyclical; (0.2ex,0.2ex) categorical;
  • Figure 4: Llama-3.2-3B-Instruct on the periodic task. Events display logarithmic compression in their frequency: long intervals (e.g., months, years) are represented with the same granularity as shorter ones (e.g., days, weeks).
  • Figure 5: Feature manifolds of Llama-3.2-3B-Instruct on the date task and its variants. Different continuations produce drastically different topologies.
  • ...and 11 more figures