Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics
Sheldon Yu, Yuxin Xiong, Junda Wu, Xintong Li, Tong Yu, Xiang Chen, Ritwik Sinha, Jingbo Shang, Julian McAuley
TL;DR
The paper tackles the limited explainability of chain-of-thought reasoning in LLMs by introducing a state-aware transition framework that converts CoT trajectories into latent dynamics. It segments CoT into steps, computes spectral embeddings from token representations, accumulates Gram matrices to form $G_t$, derives $E_t$ as a vector of eigenvalues, clusters into latent states, and models transitions with a first-order Markov chain $P$ to reveal global reasoning dynamics. The contributions include a general CoT abstraction into latent dynamics, visualization and diagnostics of semantic roles and transitions, and empirical evidence of consistent latent patterns across multiple tasks and models, validated via trajectory simulations. The findings show that latent states correspond to interpretable functional roles and that transitions follow coherent, directionally meaningful patterns that align with real CoT step ordering. This framework enhances interpretability and provides practical diagnostic capabilities for CoT outputs in real-world reasoning tasks.
Abstract
Recent advances in chain-of-thought (CoT) prompting have enabled large language models (LLMs) to perform multi-step reasoning. However, the explainability of such reasoning remains limited, with prior work primarily focusing on local token-level attribution, such that the high-level semantic roles of reasoning steps and their transitions remain underexplored. In this paper, we introduce a state-aware transition framework that abstracts CoT trajectories into structured latent dynamics. Specifically, to capture the evolving semantics of CoT reasoning, each reasoning step is represented via spectral analysis of token-level embeddings and clustered into semantically coherent latent states. To characterize the global structure of reasoning, we model their progression as a Markov chain, yielding a structured and interpretable view of the reasoning process. This abstraction supports a range of analyses, including semantic role identification, temporal pattern visualization, and consistency evaluation.
