CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute
Chen Jin, Ryutaro Tanno, Tom Diethe, Philip Teare
TL;DR
CoRefine introduces confidence-guided self-refinement, enabling adaptive, token-efficient reasoning by using a lightweight Conv1D controller to decide HALT, RETHINK, or ALTERNATIVE actions based on full-trace token-level confidence. By treating confidence as a control signal rather than a correctness estimate, the method achieves comparable or better accuracy than large-parallel sampling while reducing token usage by roughly 190× and delivering substantial wall-clock speedups. The approach is validated across multiple open-source models and diverse math benchmarks, with strong results in both standard and regulated-domain (BixBench) settings, and extended with a CoRefine Tree variant for hybrid sequential-parallel reasoning. The work provides a modular, generalizable primitive for scalable reasoning and agentic systems with imperfect verifiers, enabling targeted refinement and safe halting decisions in practical deployments.
Abstract
Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the tokens via a lightweight 211k-parameter Conv1D controller atop a frozen LLM. The controller consumes full-trace confidence to decide whether to halt, re-examine, or try a different approach, enabling targeted self-correction with an average of 2.7 refinement steps per problem and roughly 190-fold token reduction relative to 512-sample baselines. Across diverse reasoning benchmarks and three open-source models, the controller achieves 92.6 percent precision when it confidently halts, indicating that confidence dynamics reliably signal correctness without ground-truth verification. We extend this to CoRefine-Tree, a hybrid sequential-parallel variant that adaptively balances exploration and exploitation, with easy serving integration and verifier compatibility. By treating confidence as a control signal rather than a correctness guarantee, CoRefine provides a modular primitive for scalable reasoning and agentic settings with imperfect verifiers.
