Table of Contents
Fetching ...

PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation

Ruidi Chang, Jiawei Zhou, Hanjie Chen

Abstract

Large language models (LLMs) solve complex problems by generating multi-step reasoning traces. Yet these traces are typically analyzed from only one of two perspectives: the sequence of tokens across different reasoning steps in the generated text, or the hidden-state vectors across model layers within one step. We introduce PRISM (Probabilistic Reasoning Inspection through Semantic and Implicit Modeling), a framework and diagnostic tool for jointly analyzing both levels, providing a unified view of how reasoning evolves across steps and layers. Across multiple reasoning models and benchmarks, PRISM uncovers systematic patterns in the reasoning process, showing that failed trajectories are more likely to become trapped in unproductive verification loops and further diverge into distinct modes such as overthinking and premature commitment, which behave differently once a candidate answer is reached. It further reveals how prompting reshapes reasoning behavior beyond aggregate accuracy by altering both semantic transitions and internal computational patterns. By modeling reasoning trajectories as structured processes, PRISM makes these behaviors observable and analyzable rather than relying solely on final-task accuracy. Taken together, these insights position PRISM as a practical tool for analyzing and diagnosing reasoning processes in LLMs.

PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation

Abstract

Large language models (LLMs) solve complex problems by generating multi-step reasoning traces. Yet these traces are typically analyzed from only one of two perspectives: the sequence of tokens across different reasoning steps in the generated text, or the hidden-state vectors across model layers within one step. We introduce PRISM (Probabilistic Reasoning Inspection through Semantic and Implicit Modeling), a framework and diagnostic tool for jointly analyzing both levels, providing a unified view of how reasoning evolves across steps and layers. Across multiple reasoning models and benchmarks, PRISM uncovers systematic patterns in the reasoning process, showing that failed trajectories are more likely to become trapped in unproductive verification loops and further diverge into distinct modes such as overthinking and premature commitment, which behave differently once a candidate answer is reached. It further reveals how prompting reshapes reasoning behavior beyond aggregate accuracy by altering both semantic transitions and internal computational patterns. By modeling reasoning trajectories as structured processes, PRISM makes these behaviors observable and analyzable rather than relying solely on final-task accuracy. Taken together, these insights position PRISM as a practical tool for analyzing and diagnosing reasoning processes in LLMs.
Paper Structure (38 sections, 12 equations, 12 figures, 15 tables)

This paper contains 38 sections, 12 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: PRISM: an explicit stage modeling semantic category flows (Section \ref{['sec:explicit']}), where each reasoning step is assigned a semantic role such as Setup & Retrieval (SR), Analysis & Computation (AC), Uncertainty & Verification (UV), or Final Answer (FA); an implicit stage capturing computational structures within each step via Gaussian mixture models, producing latent computational regimes $z_1, z_2, \ldots$, with a cross-step bridge matrix that jointly models category transitions and internal computational patterns between consecutive steps.
  • Figure 2: 2D PCA projection of per-step activations colored by category. Diamond markers = category centroids. Separation between clusters indicates distinct hidden-state representations per category.
  • Figure 3: Mean posterior probability $\gamma_{\ell,k}$ difference averaged across all steps (blue = more in correct, red = more in incorrect). Columns = transformer layers (left=early, right=late), rows = regimes. Strong colored cells in the difference plot indicate (layer, regime) pairs that discriminate correct from incorrect reasoning.
  • Figure 4: Step-level layer trajectories in PCA space (Stratos, GPQA-Diamond, UV). Each path traces the hidden-state path of a single reasoning step across transformer layers ($\ell = 1 \to L$) projected onto the first two PCA dimensions. Points are colored by their decoded regime ($R_0$--$R_{K-1}$). Stars mark per-regime centroids. Solid blue paths correspond to correct sequences; dashed orange paths to incorrect sequences.
  • Figure 5: The cumulative explained variance curve tends to flatten after 128 dimensions, with smaller gains from additional components.
  • ...and 7 more figures