Table of Contents
Fetching ...

Dyve: Thinking Fast and Slow for Dynamic Process Verification

Jianyuan Zhong, Zeju Li, Zhijian Xu, Xiangyu Wen, Qiang Xu

TL;DR

Dyve proposes a dynamic process verifier that combines fast System 1 checks with deep System 2 analyses to detect reasoning errors in large language models. It introduces a step-wise consensus-filtered supervision pipeline that uses Monte Carlo rollouts and LLM judges to curate high-quality training data, enabling effective training of a dual-system verifier. Experiments on ProcessBench and MATH-500 show that Dyve outperforms existing process verifiers and yields strong gains when integrated with proposer LLMs under Best-of-N decoding. The approach improves robustness and efficiency of AI reasoning, offering a practical pathway toward more reliable, transparent multi-step problem solving.

Abstract

We present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation System 1 for straightforward steps and comprehensive analysis System 2 for complex ones. Leveraging a novel step-wise consensus-filtered process supervision technique, combining Monte Carlo estimation with LLM based evaluation, Dyve curates high-quality supervision signals from noisy data. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings.

Dyve: Thinking Fast and Slow for Dynamic Process Verification

TL;DR

Dyve proposes a dynamic process verifier that combines fast System 1 checks with deep System 2 analyses to detect reasoning errors in large language models. It introduces a step-wise consensus-filtered supervision pipeline that uses Monte Carlo rollouts and LLM judges to curate high-quality training data, enabling effective training of a dual-system verifier. Experiments on ProcessBench and MATH-500 show that Dyve outperforms existing process verifiers and yields strong gains when integrated with proposer LLMs under Best-of-N decoding. The approach improves robustness and efficiency of AI reasoning, offering a practical pathway toward more reliable, transparent multi-step problem solving.

Abstract

We present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation System 1 for straightforward steps and comprehensive analysis System 2 for complex ones. Leveraging a novel step-wise consensus-filtered process supervision technique, combining Monte Carlo estimation with LLM based evaluation, Dyve curates high-quality supervision signals from noisy data. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings.

Paper Structure

This paper contains 34 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: (1) LLM self-reflection is unreliable (2) Binary verification lacks depth, (3) Chain-of-Thought (CoT) verification is deeper but more expensive, (4) GenRM with CoT combines generation and verification without step-wise assessment, (5) Dyve, our proposed framework that dynamically combines fast System 1 and deep System 2 verification.
  • Figure 2: Inference speed comparison on ProcesBench, time per sample in seconds, for System-1, Dyve, and DeepSeek-R1-14B.
  • Figure 3: Impact of model choice and step-wise consensus filtering on performance across GSM8K, MATH, OlympiadBench, and OmniMATH. The figure illustrates improvements achieved through consensus filtering and step-wise flagging, highlighting the superior performance of the 14B reasoning model over the 7B Llama.
  • Figure 4: Comparison of Dyve, Dyve System1 and Majority Vote with different generation budget when integrating with Proposer LLMs (DeepSeek-R1-Distill-Qwen-14B as solid line, Qwen2.5-MATH-7B-Instruct as dotted line).