Table of Contents
Fetching ...

Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models

Waldemar Chang

TL;DR

Directional Reasoning Trajectory Change (DRTC) is introduced, a process-causal framework for interpreting long-form reasoning from a single on-policy rollout that provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.

Abstract

Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens or spans correlated with an answer, but they rarely reveal where the model makes consequential reasoning turns, which earlier context causally triggers those turns, or whether the highlighted text actually steers the reasoning process. We introduce Directional Reasoning Trajectory Change (DRTC), a process-causal framework for interpreting long-form reasoning from a single on-policy rollout. DRTC detects pivot decision points using uncertainty and distribution-shift signals, then applies receiver-side interventions that preserve the realized rollout without resampling the continuation while blocking information flow from selected earlier chunks only at a pivot. It measures whether each intervention redirects the direction of the model's log-probability trajectory relative to the realized rollout direction, producing a signed per-chunk attribution score. We also compute turning-angle curvature changes on raw logits as a complementary diagnostic and introduce curvature signatures to summarize shared intervention-response geometry. Empirically, directional influence is sharply concentrated across four reasoning models (per-example |DRTC| shares yield Gini 0.50 to 0.58 and top-5 percent mass 0.23 to 0.28), and learned pivots induce stronger intervention magnitudes than matched random spans. In a scaling study on 500 MATH problems with R1-Distill-Qwen-1.5B, learned spans outperform matched random spans (median delta = 0.409, 355 of 500 positive; sign test p = 2.3e-21). Overall, DRTC provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.

Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models

TL;DR

Directional Reasoning Trajectory Change (DRTC) is introduced, a process-causal framework for interpreting long-form reasoning from a single on-policy rollout that provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.

Abstract

Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens or spans correlated with an answer, but they rarely reveal where the model makes consequential reasoning turns, which earlier context causally triggers those turns, or whether the highlighted text actually steers the reasoning process. We introduce Directional Reasoning Trajectory Change (DRTC), a process-causal framework for interpreting long-form reasoning from a single on-policy rollout. DRTC detects pivot decision points using uncertainty and distribution-shift signals, then applies receiver-side interventions that preserve the realized rollout without resampling the continuation while blocking information flow from selected earlier chunks only at a pivot. It measures whether each intervention redirects the direction of the model's log-probability trajectory relative to the realized rollout direction, producing a signed per-chunk attribution score. We also compute turning-angle curvature changes on raw logits as a complementary diagnostic and introduce curvature signatures to summarize shared intervention-response geometry. Empirically, directional influence is sharply concentrated across four reasoning models (per-example |DRTC| shares yield Gini 0.50 to 0.58 and top-5 percent mass 0.23 to 0.28), and learned pivots induce stronger intervention magnitudes than matched random spans. In a scaling study on 500 MATH problems with R1-Distill-Qwen-1.5B, learned spans outperform matched random spans (median delta = 0.409, 355 of 500 positive; sign test p = 2.3e-21). Overall, DRTC provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.
Paper Structure (106 sections, 18 equations, 23 figures, 17 tables)

This paper contains 106 sections, 18 equations, 23 figures, 17 tables.

Figures (23)

  • Figure 1: DRTC pipeline overview. Curvature is diagnostic only and is not used to define pivots or scores.
  • Figure 2: Curvature invariance under diagnostic logging (representative model: R1-Distill-Qwen-1.5B). Per-chunk DRTC scores from C0 and C8 lie on the identity line ($\rho=1.000$), confirming that enabling curvature computation is strictly diagnostic and does not alter attribution.
  • Figure 3: Cross-model comparison of median per-example mean pivot-local intervention magnitude. Learned pivots (C8) induce stronger interventions than matched random spans (C9) across all four models. Error bars denote 95% bootstrap confidence intervals.
  • Figure 4: C0 vs. C8 invariance across models. Each panel shows per-chunk DRTC scores under C0 (baseline) versus C8 (curvature-enabled). Points lie exactly on the identity line, confirming that curvature computation is strictly diagnostic and does not alter attribution.
  • Figure 5: Per-example DRTC attribution concentration across models. Each panel shows the fraction of total per-example $|\mathrm{DRTC}|$ mass captured by the top-ranked chunk(s) within an example. Across architectures, a small number of chunks accounts for a disproportionate share of directional influence, consistent with moderate but robust sparsity (see Tabs. \ref{['tab:gini_all']}--\ref{['tab:topk_mass']}).
  • ...and 18 more figures