Table of Contents
Fetching ...

Geometry of Decision Making in Language Models

Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi

TL;DR

The paper investigates how internal representations in transformer-based LLMs organize themselves geometrically during reasoning, using intrinsic dimension ($\mathrm{ID}$) as the core metric. Through a large-scale study of 28 open-weight models on MCQA tasks (real-world and synthetic), it uncovers a consistent three-stage ID trajectory where middle-layer ID peaks precede decisive predictions, indicating a compression into task-relevant, low-dimensional manifolds. The work demonstrates that MLP-out drives sharper ID transitions, while residual-post signals accumulate more gradually, and shows that few-shot prompting accelerates compression and decisiveness. These findings offer a geometric lens on generalization and decision formation in LLMs, with implications for interpretability and model optimization.

Abstract

Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.

Geometry of Decision Making in Language Models

TL;DR

The paper investigates how internal representations in transformer-based LLMs organize themselves geometrically during reasoning, using intrinsic dimension () as the core metric. Through a large-scale study of 28 open-weight models on MCQA tasks (real-world and synthetic), it uncovers a consistent three-stage ID trajectory where middle-layer ID peaks precede decisive predictions, indicating a compression into task-relevant, low-dimensional manifolds. The work demonstrates that MLP-out drives sharper ID transitions, while residual-post signals accumulate more gradually, and shows that few-shot prompting accelerates compression and decisiveness. These findings offer a geometric lens on generalization and decision formation in LLMs, with implications for interpretability and model optimization.

Abstract

Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.

Paper Structure

This paper contains 14 sections, 17 equations, 40 figures, 9 tables.

Figures (40)

  • Figure 1: In the transformer-based architectures, a vector (latent features) of the same hidden dimensions $d$, is transformed by transformer blocks $f_l$. Though the extrinsic dimension remains the same, we find that the feature space lies on low-dimensional manifolds of different intrinsic dimensions $\mathbb{R}^{\mathrm{id}_{l}}$. Intrinsically, there exists a mapping $\phi_l$ corresponding to each $f_l$, from $\mathbb{R}^{\mathrm{id}_{l-1}}\to \mathbb{R}^{\mathrm{id}_{l}}$. We study how these compressed manifolds align with the decision-making process in middle layers. We project the internal representations back to the vocabulary space to inspect the decisiveness. There is a sudden shift in performance that is aligned with the follow-up of a sharp peak observed in the residual-post ID estimates.
  • Figure 2: Accuracy along with ID trends for LLaMA model variants on the MMLU STEM dataset.
  • Figure 3: Accuracy along with ID trends for for LLaMA model variants on the COPA dataset.
  • Figure 4: The figure shows the ID of the last layer (MLP Out) feature representation in the in-context learning setting. The box plot shows the distribution of ID for all the $28$ open-weight models. Overall, we observe IDs decreasing as more number of examples are provided in the context.
  • Figure 5: ID of residual post hidden layers in Pythia series models evolving throughout training for the Arithmetic dataset. The red curve shows the final checkpoint for architectures of different sizes. The top row shows the quality of layer representations in the form of accuracy (log scale). Interestingly, we observe that the model starts to be decisive about the correct token, where the ID shows a reverse peak (highlighted as black dashed vertical lines).
  • ...and 35 more figures