Table of Contents
Fetching ...

Emergent Manifold Separability during Reasoning in Large Language Models

Alexandre Polo, Chanwoo Chun, SueYeon Chung

TL;DR

This work investigates the temporal dynamics of the underlying representation geometry by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing it to quantify the linear separability of latent representations without the confounding factors of probe training.

Abstract

Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing us to quantify the linear separability of latent representations without the confounding factors of probe training. Our analysis reveals that reasoning manifests as a transient geometric pulse, where concept manifolds are untangled into linearly separable subspaces immediately prior to computation and rapidly compressed thereafter. This behavior diverges from standard linear probe accuracy, which remains high long after computation, suggesting a fundamental distinction between information that is merely retrievable and information that is geometrically prepared for processing. We interpret this phenomenon as \emph{Dynamic Manifold Management}, a mechanism where the model dynamically modulates representational capacity to optimize the bandwidth of the residual stream throughout the reasoning chain.

Emergent Manifold Separability during Reasoning in Large Language Models

TL;DR

This work investigates the temporal dynamics of the underlying representation geometry by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing it to quantify the linear separability of latent representations without the confounding factors of probe training.

Abstract

Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing us to quantify the linear separability of latent representations without the confounding factors of probe training. Our analysis reveals that reasoning manifests as a transient geometric pulse, where concept manifolds are untangled into linearly separable subspaces immediately prior to computation and rapidly compressed thereafter. This behavior diverges from standard linear probe accuracy, which remains high long after computation, suggesting a fundamental distinction between information that is merely retrievable and information that is geometrically prepared for processing. We interpret this phenomenon as \emph{Dynamic Manifold Management}, a mechanism where the model dynamically modulates representational capacity to optimize the bandwidth of the residual stream throughout the reasoning chain.
Paper Structure (32 sections, 2 equations, 16 figures, 3 tables)

This paper contains 32 sections, 2 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Manifold Untangling during Chain-of-Thought. Before any token is generated (left) the latent representations of tasks corresponding to answers A and B are entangled and hard to distinguish resulting in low capacity. As the model generates the CoT, it progressively untangles these representations. By the final token (right), the two manifolds are linearly separable and capacity is high.
  • Figure 2: a. Example of a Boolean logic tree with height $h=2$. Left: The corresponding text input provided to the model, where internal nodes are labeled [01] through [03]. Right: Schematic of the tree structure. b. Accuracy on Boolean logic trees of varying height. Performance with CoT (purple) remains near-perfect ($>98\%$), while standard prompting (No CoT, orange) degrades significantly as tree depth increases ($61\%$ for $h=5$). The dashed line marks the random baseline (50%).
  • Figure 3: Dynamic Modulation of Manifold Geometry during CoT.a. Manifold capacity ($\alpha$) tracked across the Chain-of-Thought sequence (layer 20, tree height $h=4$, see Supplementary for height 5). Lines are colored by node ID. Capacity for a specific node peaks sharply at two distinct moments: first when the node is intrinsically computed, and second when it is processed by its parent. b. Detailed analysis of Node 11 (yellow) and its children, Node 5 (blue) and Node 6 (turquoise). Insets visualize the latent geometry at key timesteps. High capacity peaks correspond to linearly separable manifolds (clear decision boundaries). Notably, representations become entangled (low capacity) in the interim periods between the initial solution (solve) and the subsequent retrieval (recall). c. Intrinsic dimensionality estimates of the latent embeddings using Two-Nearest Neighbors (TwoNN) and Participation Ratio (PR).
  • Figure 4: Dynamics of separability metrics during Solve and Recall events. Data is aligned to the moment a node is computed (Left column: a, c) or when it is recalled by its parent (Right column: b, d). (a, b) Manifold Capacity: Shows a sharp, transient peak at the moment of computation and recall, dropping quickly to baseline in between. (c, d) Hard-margin SVM probe test accuracy: While accuracy also peaks at computation, it exhibits a sustained plateau, decaying much slower than capacity. Vertical dashed lines mark the relative positions of structural tokens (Header, Result, Logic).
  • Figure 5: Spatiotemporal heatmap of manifold capacity change. The x-axis represents token positions relative to the computation step ($t=0$). (During Computation, $t=0,1$) A capacity increase initiates in the middle layers at $t=0$, culminating in a sharp peak within the deep layers at the token preceding the result ($t=+2$). (Post-Computation) For $t > +2$, the early-to-mid layers retain elevated capacity as they retain the immediate context; however, this capacity decreases in the middle layers as the model reorganizes its representation to shift focus to the next node.
  • ...and 11 more figures