Fluid Representations in Reasoning Models

Dmitrii Kharlapenko; Alessandro Stolfo; Arthur Conmy; Mrinmaya Sachan; Zhijing Jin

Fluid Representations in Reasoning Models

Dmitrii Kharlapenko, Alessandro Stolfo, Arthur Conmy, Mrinmaya Sachan, Zhijing Jin

TL;DR

The paper investigates how a reasoning-focused language model, QwQ-32B, builds abstract representations of problem structure during extended reasoning in semantically obfuscated BlocksWorld tasks. It introduces a representation-collection pipeline to extract action/predicate vectors, demonstrates cross-naming convergence toward surface-invariant encodings, and shows via steering experiments that refined representations causally improve problem solving. The results reveal fluid, context-dependent refinements that converge to symbolic abstractions, enabling transfer across obfuscated namings and indicating a general mechanism by which extended reasoning traces enhance understanding of abstract structure. Together, these findings advance interpretability of long-form reasoning by linking representational adaptation to performance and suggesting design directions for future reasoning-enabled systems.

Abstract

Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this superior performance remain poorly understood. We present a mechanistic analysis of how QwQ-32B - a model specifically trained to produce extensive reasoning traces - process abstract structural information. On Mystery Blocksworld - a semantically obfuscated planning domain - we find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning. The model develops abstract encodings that focus on structure rather than specific action names. Through steering experiments, we establish causal evidence that these adaptations improve problem solving: injecting refined representations from successful traces boosts accuracy, while symbolic representations can replace many obfuscated encodings with minimal performance loss. We find that one of the factors driving reasoning model performance is in-context refinement of token representations, which we dub Fluid Reasoning Representations.

Fluid Representations in Reasoning Models

TL;DR

Abstract

Paper Structure (61 sections, 5 equations, 11 figures, 5 tables)

This paper contains 61 sections, 5 equations, 11 figures, 5 tables.

Introduction
Key Observations.
Background
BlocksWorld.
Mystery BlocksWorld.
Terminology.
Initial Evaluations
Mystery Performance Analysis
Representation Collection
Overview.
Representation extraction.
In-naming representations.
Cross-naming representations.
Representational Studies
Cross-Naming Representational Convergence
...and 46 more sections

Figures (11)

Figure 1: Overview of our pipeline. Left: QwQ-32B's accuracy on Standard BlocksWorld is 96%. Center: Mystery BlocksWorld obfuscates semantics (e.g., "pick up" $\to$ "attack"), reducing QwQ's accuracy to 33%. During extended reasoning traces, the model progressively refines internal representations of obfuscated actions, developing abstract symbolic encodings (vectors $v_0, \dots, v_3,$ and $u_0, \dots, u_3$ are extracted at different Chain-of-Thought timestamps). Right: Steering experiments inject these refined representations into early reasoning stages, improving accuracy up to 43%, demonstrating that representational adaptations causally contribute to problem-solving performance.
Figure 2: Average similarity of representations from other namings with naming 1 representations, extracted from different timestamps.
Figure 3: Layer-wise PCA of action representations from different mystery namings extracted at 7k tokens. More layers is \ref{['app:pca']}.
Figure 4: Similarity with cross-naming representations between Mystery and Original BlocksWorld traces. (a) Shows average similarities of centered action/predicate representations from all timestamps in Mystery Blocksworld traces with cross-naming representations extracted at 7k tokens. Note that similarities between different actions become increasingly negative. (b) Shows average similarities of clean BlocksWorld representations from all timestamps with cross-naming representations extracted at 7k tokens. Plot for predicates is absent, since it's much harder to identify their tokens in regular BlocksWorld traces.
Figure 5: Average similarity of representations extracted from the 7k timestamp, plotted for both QwQ and its base model on QwQ traces. (a) Shows similarity of representations from other namings with naming 1 representations (averaged across all other namings). (b) Shows similarity of representations from original BlocksWorld traces with representations from different mystery namings (averaged across them).
...and 6 more figures

Fluid Representations in Reasoning Models

TL;DR

Abstract

Fluid Representations in Reasoning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)