Table of Contents
Fetching ...

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

Ximing Lu, Seungju Han, David Acuna, Hyunwoo Kim, Jaehun Jung, Shrimai Prabhumoye, Niklas Muennighoff, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi

TL;DR

Retro-Search introduces a Monte-Carlo Tree Search–inspired method to retrospectively revise reasoning traces produced by large models, aiming to reduce under- and over-thinking by exploring untaken paths within existing thoughts. The approach yields two use cases: self-improvement, where models train on their own revised traces, and weak-to-strong revision, where a smaller model revises traces produced by a larger model to generate higher-quality data. Empirically, training on Retro-Search revised data improves accuracy and halves or significantly reduces average reasoning length across multiple math benchmarks, with notable gains at 7B and 32B scales and faster inference. These results challenge the notion that only longer traces or frontier models drive progress, showing that algorithmic data refinement can meaningfully enhance reasoning capabilities and efficiency in both self-contained and cross-model setups.

Abstract

Large reasoning models exhibit remarkable reasoning capabilities via long, elaborate reasoning trajectories. Supervised fine-tuning on such reasoning traces, also known as distillation, can be a cost-effective way to boost reasoning capabilities of student models. However, empirical observations reveal that these reasoning trajectories are often suboptimal, switching excessively between different lines of thought, resulting in under-thinking, over-thinking, and even degenerate responses. We introduce Retro-Search, an MCTS-inspired search algorithm, for distilling higher quality reasoning paths from large reasoning models. Retro-Search retrospectively revises reasoning paths to discover better, yet shorter traces, which can then lead to student models with enhanced reasoning capabilities with shorter, thus faster inference. Our approach can enable two use cases: self-improvement, where models are fine-tuned on their own Retro-Search-ed thought traces, and weak-to-strong improvement, where a weaker model revises stronger model's thought traces via Retro-Search. For self-improving, R1-distill-7B, fine-tuned on its own Retro-Search-ed traces, reduces the average reasoning length by 31.2% while improving performance by 7.7% across seven math benchmarks. For weak-to-strong improvement, we retrospectively revise R1-671B's traces from the OpenThoughts dataset using R1-distill-32B as the Retro-Search-er, a model 20x smaller. Qwen2.5-32B, fine-tuned on this refined data, achieves performance comparable to R1-distill-32B, yielding an 11.3% reduction in reasoning length and a 2.4% performance improvement compared to fine-tuning on the original OpenThoughts data. Our work counters recently emergent viewpoints that question the relevance of search algorithms in the era of large reasoning models, by demonstrating that there are still opportunities for algorithmic advancements, even for frontier models.

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

TL;DR

Retro-Search introduces a Monte-Carlo Tree Search–inspired method to retrospectively revise reasoning traces produced by large models, aiming to reduce under- and over-thinking by exploring untaken paths within existing thoughts. The approach yields two use cases: self-improvement, where models train on their own revised traces, and weak-to-strong revision, where a smaller model revises traces produced by a larger model to generate higher-quality data. Empirically, training on Retro-Search revised data improves accuracy and halves or significantly reduces average reasoning length across multiple math benchmarks, with notable gains at 7B and 32B scales and faster inference. These results challenge the notion that only longer traces or frontier models drive progress, showing that algorithmic data refinement can meaningfully enhance reasoning capabilities and efficiency in both self-contained and cross-model setups.

Abstract

Large reasoning models exhibit remarkable reasoning capabilities via long, elaborate reasoning trajectories. Supervised fine-tuning on such reasoning traces, also known as distillation, can be a cost-effective way to boost reasoning capabilities of student models. However, empirical observations reveal that these reasoning trajectories are often suboptimal, switching excessively between different lines of thought, resulting in under-thinking, over-thinking, and even degenerate responses. We introduce Retro-Search, an MCTS-inspired search algorithm, for distilling higher quality reasoning paths from large reasoning models. Retro-Search retrospectively revises reasoning paths to discover better, yet shorter traces, which can then lead to student models with enhanced reasoning capabilities with shorter, thus faster inference. Our approach can enable two use cases: self-improvement, where models are fine-tuned on their own Retro-Search-ed thought traces, and weak-to-strong improvement, where a weaker model revises stronger model's thought traces via Retro-Search. For self-improving, R1-distill-7B, fine-tuned on its own Retro-Search-ed traces, reduces the average reasoning length by 31.2% while improving performance by 7.7% across seven math benchmarks. For weak-to-strong improvement, we retrospectively revise R1-671B's traces from the OpenThoughts dataset using R1-distill-32B as the Retro-Search-er, a model 20x smaller. Qwen2.5-32B, fine-tuned on this refined data, achieves performance comparable to R1-distill-32B, yielding an 11.3% reduction in reasoning length and a 2.4% performance improvement compared to fine-tuning on the original OpenThoughts data. Our work counters recently emergent viewpoints that question the relevance of search algorithms in the era of large reasoning models, by demonstrating that there are still opportunities for algorithmic advancements, even for frontier models.

Paper Structure

This paper contains 25 sections, 3 equations, 2 figures, 8 tables, 1 algorithm.

Figures (2)

  • Figure 1: An example reasoning trace from Retro-Search in weak-to-strong revision. A reasoning trace consists of a series of thoughts segmented by transition keywords (e.g., “alternatively”, “wait”), with each thought composed of a sequence of intermediate steps, delimited by '\\ n\\ n'. Retro-Search retrospectively revises reasoning trajectories - exploring promising thoughts that were prematurely abandoned to mitigate under-thinking while avoiding redundant thoughts once the correct answer is evident to reduce over-thinking.
  • Figure 2: An overview of our Retro-Search algorithm. The algorithm iterates through the thoughts and explores untaken paths from steps that come before a thought-switch, which is marked by transition keywords like "wait" or "another approach." During the process, it performs multiple rollouts, suppressing these transition keywords in the immediate next step. If the search is successful, the existing trajectory is replaced with the new rollout, and the process continues through the updated trajectory.