Lost at the Beginning of Reasoning
Baohao Liao, Xinyi Chen, Sara Rajaee, Yuhui Xu, Christian Herold, Anders Søgaard, Maarten de Rijke, Christof Monz
TL;DR
This work investigates long-chain-of-thought reasoning in large language models and finds that the initial reasoning step $t_1$ dominantly shapes the final prediction, with incorrect first steps causing large accuracy drops and extensive subsequent reasoning (overthinking). It introduces an efficient sampling method that generates multiple candidate first steps, scores them with a reward model, and continues only the top $M$ steps, achieving up to $70\%$ reduction in inference cost without sacrificing accuracy. The approach is validated across five model families and five challenging benchmarks in math, science, and programming, demonstrating robust first-step influence and practical gains. The findings suggest that optimizing or leveraging the very first step is a fruitful direction for building more accurate and compute-efficient reasoning LLMs.
Abstract
Recent advancements in large language models (LLMs) have significantly advanced complex reasoning capabilities, particularly through extended chain-of-thought (CoT) reasoning that incorporates mechanisms such as backtracking, self-reflection, and self-correction. Despite these developments, the self-correction abilities of LLMs during long CoT reasoning remain underexplored. And recent findings on overthinking suggest that such models often engage in unnecessarily redundant reasoning. In this work, we empirically show that the first reasoning step exerts a disproportionately large influence on the final prediction. I.e., errors introduced at this stage can substantially degrade subsequent reasoning quality. This phenomenon is consistently observed across various state-of-the-art open- and closed-source reasoning models. Leveraging this insight, we propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps while discarding suboptimal ones, achieving up to a 70% reduction in inference cost without sacrificing any accuracy. Our work highlights the central role of the first reasoning step in generating a high-quality reasoning trajectory, and thus enabling significantly efficient sampling.
