Geometry of Decision Making in Language Models
Abhinav Joshi, Divyanshu Bhatt, Ashutosh Modi
TL;DR
The paper investigates how internal representations in transformer-based LLMs organize themselves geometrically during reasoning, using intrinsic dimension ($\mathrm{ID}$) as the core metric. Through a large-scale study of 28 open-weight models on MCQA tasks (real-world and synthetic), it uncovers a consistent three-stage ID trajectory where middle-layer ID peaks precede decisive predictions, indicating a compression into task-relevant, low-dimensional manifolds. The work demonstrates that MLP-out drives sharper ID transitions, while residual-post signals accumulate more gradually, and shows that few-shot prompting accelerates compression and decisiveness. These findings offer a geometric lens on generalization and decision formation in LLMs, with implications for interpretability and model optimization.
Abstract
Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.
