Table of Contents
Fetching ...

Asking LLMs to Verify First is Almost Free Lunch

Shiguang Wu, Quanming Yao

TL;DR

The paper tackles the high costs of improving LLM reasoning by introducing Verification-First prompting, which asks models to verify a candidate answer before solving, leveraging a reverse reasoning path that complements forward chain-of-thought. It generalizes this idea into Iter-VF for test-time scaling, enabling sequential verification that maintains a compact context window. Across diverse benchmarks and model families, VF with random/trivial answers consistently outperforms standard CoT, and Iter-VF outperforms existing test-time strategies with minimal overhead. The approach proves robust in real-world, open-ended tasks and under thought-hidden LLM services, offering a training-free, cost-effective path to stronger reasoning.

Abstract

To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce Verification-First (VF), a strategy that prompts models to verify a provided candidate answer, even a trivial or random one, before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors. We further generalize the VF strategy to Iter-VF, a sequential test-time scaling (TTS) method that iteratively cycles the verification-generation process using the model's previous answer. Extensive experiments across various benchmarks (from mathematical reasoning to coding and agentic tasks) and various LLMs (from open-source 1B to cutting-edge commercial ones) confirm that VF with random answer consistently outperforms standard CoT with minimal computational overhead, and Iter-VF outperforms existing TTS strategies.

Asking LLMs to Verify First is Almost Free Lunch

TL;DR

The paper tackles the high costs of improving LLM reasoning by introducing Verification-First prompting, which asks models to verify a candidate answer before solving, leveraging a reverse reasoning path that complements forward chain-of-thought. It generalizes this idea into Iter-VF for test-time scaling, enabling sequential verification that maintains a compact context window. Across diverse benchmarks and model families, VF with random/trivial answers consistently outperforms standard CoT, and Iter-VF outperforms existing test-time strategies with minimal overhead. The approach proves robust in real-world, open-ended tasks and under thought-hidden LLM services, offering a training-free, cost-effective path to stronger reasoning.

Abstract

To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce Verification-First (VF), a strategy that prompts models to verify a provided candidate answer, even a trivial or random one, before generating a solution. This approach triggers a "reverse reasoning" process that is cognitively easier and complementary to standard forward Chain-of-Thought (CoT), effectively invoking the model's critical thinking to reduce logical errors. We further generalize the VF strategy to Iter-VF, a sequential test-time scaling (TTS) method that iteratively cycles the verification-generation process using the model's previous answer. Extensive experiments across various benchmarks (from mathematical reasoning to coding and agentic tasks) and various LLMs (from open-source 1B to cutting-edge commercial ones) confirm that VF with random answer consistently outperforms standard CoT with minimal computational overhead, and Iter-VF outperforms existing TTS strategies.

Paper Structure

This paper contains 22 sections, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: A reverse reasoning path (verification process) could be easier to find and contain complementary information to forward-reasoning path (standard CoT).
  • Figure 2: VF prompting with random/trivial answer (Right), comparing with standard CoT prompting (Left)
  • Figure 3: Illustration of (a) VF prompting with previously generated answer, and iterating such process as (b) Iter-VF for test-time scaling.
  • Figure 4: VF prompting consistently outperforms standard CoT prompting.
  • Figure 5: Providing different answers to VF for verification.
  • ...and 1 more figures