Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Jianing Wang, Jin Jiang, Yang Liu, Mengdi Zhang, Xunliang Cai
TL;DR
This work tackles the susceptibility of LLMs to errors in complex reasoning by introducing process prejudge, a test-time pause that anticipates and mitigates potential mistakes. The authors propose Prejudge-Before-Think (PBT), a dynamic tree-searching framework in which a single LLM performs thinking, critique, prejudging, and verification, guided by prejudge nodes defined through a backpropagated value function. A two-phase post-training pipeline combines supervised fine-tuning and reinforcement learning to synthesize prejudge-rich data and improve reasoning efficiency, achieving substantial gains on competition-level benchmarks. The approach demonstrates that prejudging before thinking can meaningfully improve reasoning accuracy, with practical implications for scalable, robust LLM systems, albeit at a notable test-time computational cost. The authors release code and data to enable further exploration and adoption of prejudge-informed reasoning.
Abstract
In this paper, we introduce a new \emph{process prejudge} strategy in LLM reasoning to demonstrate that bootstrapping with process prejudge allows the LLM to adaptively anticipate the errors encountered when advancing the subsequent reasoning steps, similar to people sometimes pausing to think about what mistakes may occur and how to avoid them, rather than relying solely on trial and error. Specifically, we define a prejudge node in the rationale, which represents a reasoning step, with at least one step that follows the prejudge node that has no paths toward the correct answer. To synthesize the prejudge reasoning process, we present an automated reasoning framework with a dynamic tree-searching strategy. This framework requires only one LLM to perform answer judging, response critiquing, prejudge generation, and thought completion. Furthermore, we develop a two-phase training mechanism with supervised fine-tuning (SFT) and reinforcement learning (RL) to further enhance the reasoning capabilities of LLMs. Experimental results from competition-level complex reasoning demonstrate that our method can teach the model to prejudge before thinking and significantly enhance the reasoning ability of LLMs. Code and data is released at https://github.com/wjn1996/Prejudge-Before-Think.
