Fluid Representations in Reasoning Models
Dmitrii Kharlapenko, Alessandro Stolfo, Arthur Conmy, Mrinmaya Sachan, Zhijing Jin
TL;DR
The paper investigates how a reasoning-focused language model, QwQ-32B, builds abstract representations of problem structure during extended reasoning in semantically obfuscated BlocksWorld tasks. It introduces a representation-collection pipeline to extract action/predicate vectors, demonstrates cross-naming convergence toward surface-invariant encodings, and shows via steering experiments that refined representations causally improve problem solving. The results reveal fluid, context-dependent refinements that converge to symbolic abstractions, enabling transfer across obfuscated namings and indicating a general mechanism by which extended reasoning traces enhance understanding of abstract structure. Together, these findings advance interpretability of long-form reasoning by linking representational adaptation to performance and suggesting design directions for future reasoning-enabled systems.
Abstract
Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this superior performance remain poorly understood. We present a mechanistic analysis of how QwQ-32B - a model specifically trained to produce extensive reasoning traces - process abstract structural information. On Mystery Blocksworld - a semantically obfuscated planning domain - we find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning. The model develops abstract encodings that focus on structure rather than specific action names. Through steering experiments, we establish causal evidence that these adaptations improve problem solving: injecting refined representations from successful traces boosts accuracy, while symbolic representations can replace many obfuscated encodings with minimal performance loss. We find that one of the factors driving reasoning model performance is in-context refinement of token representations, which we dub Fluid Reasoning Representations.
