Table of Contents
Fetching ...

Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

Martina G. Vilas, Safoora Yousefi, Besmira Nushi, Eric Horvitz, Vidhisha Balachandran

TL;DR

This work introduces Latent-Trajectory signals, a training-free set of metrics derived from hidden-state trajectories during reasoning traces to predict final-answer accuracy. By segmenting traces into blocks and aggregating per-layer latent states, the authors define Net Change, Cumulative Change, and Aligned Change, which outperform surface or output-based cues in predicting correctness. They demonstrate that LT signals enable more efficient multi-sample inference and enable early selection of high-quality traces, reducing token usage by up to ~70% while maintaining or improving accuracy. The findings offer practical inference-time strategies and contribute to a deeper understanding of how latent representations evolve during reasoning in language models.

Abstract

Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remains a key opportunity: reliably predicting productive paths can substantially reduce wasted computation and improve overall efficiency. We introduce Latent-Trajectory signals that characterize the temporal evolution of a model's internal representations during the generation of intermediate reasoning tokens. By measuring the overall change in latent representations between the start and end of reasoning, the change accumulated across intermediate steps, and the extent to which these changes advance toward the final state, we show that these signals predict solution accuracy more reliably than both cross-layer metrics and output-based confidence measures. When used to guide answer selection across multiple sampled generations, Latent-Trajectory signals make test-time scaling more effective and efficient than majority voting, reducing token usage by up to 70% while preserving and even improving accuracy by 2.6% on average. Moreover, these predictive signals often emerge early in the reasoning trace, enabling early selection and allocation of compute to the most promising candidates. Our findings contribute not only practical strategies for inference-time efficiency, but also a deeper interpretability perspective on how reasoning processes are represented and differentiated in latent space.

Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

TL;DR

This work introduces Latent-Trajectory signals, a training-free set of metrics derived from hidden-state trajectories during reasoning traces to predict final-answer accuracy. By segmenting traces into blocks and aggregating per-layer latent states, the authors define Net Change, Cumulative Change, and Aligned Change, which outperform surface or output-based cues in predicting correctness. They demonstrate that LT signals enable more efficient multi-sample inference and enable early selection of high-quality traces, reducing token usage by up to ~70% while maintaining or improving accuracy. The findings offer practical inference-time strategies and contribute to a deeper understanding of how latent representations evolve during reasoning in language models.

Abstract

Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remains a key opportunity: reliably predicting productive paths can substantially reduce wasted computation and improve overall efficiency. We introduce Latent-Trajectory signals that characterize the temporal evolution of a model's internal representations during the generation of intermediate reasoning tokens. By measuring the overall change in latent representations between the start and end of reasoning, the change accumulated across intermediate steps, and the extent to which these changes advance toward the final state, we show that these signals predict solution accuracy more reliably than both cross-layer metrics and output-based confidence measures. When used to guide answer selection across multiple sampled generations, Latent-Trajectory signals make test-time scaling more effective and efficient than majority voting, reducing token usage by up to 70% while preserving and even improving accuracy by 2.6% on average. Moreover, these predictive signals often emerge early in the reasoning trace, enabling early selection and allocation of compute to the most promising candidates. Our findings contribute not only practical strategies for inference-time efficiency, but also a deeper interpretability perspective on how reasoning processes are represented and differentiated in latent space.

Paper Structure

This paper contains 31 sections, 6 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Latent-Trajectory framework. Trajectory vectors are constructed from token-level hidden states, and a set of three signals is derived to quantify their temporal evolution. These signals predict successful traces and enable answer selection and early path selection in multi-sample inference.
  • Figure 2: Latent-Trajectory signals.
  • Figure 3: ROC-AUC for distinguishing correct from incorrect predictions using LT (LT) and baseline metrics. Higher values indicate better discriminative power. For comparability, Cumulative Change was sign-reversed. LT signals consistently achieve above chance (dashed line) and more reliable discrimination than baseline metrics. Error bars denote variability across models.
  • Figure 4: Latent-Trajectory signal distributions by accuracy for Qwen3-14B on the AIME 2025 dataset. Correct traces show larger Net/Aligned Change and smaller Cumulative Change than incorrect ones. This indicates that correct reasoning corresponds to larger, more directed representational shifts, while incorrect reasoning involves more wandering and less aligned trajectories.
  • Figure 5: Candidate solutions for a problem are evaluated sequentially. If a solution’s signal value exceeds $\tau$, it is immediately accepted as the final prediction. If no solution crosses $\tau$, the final answer is chosen via MV.
  • ...and 9 more figures