Table of Contents
Fetching ...

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

Chenrui Cao, Liangcheng Song, Zenan Li, Xinyi Le, Xian Zhang, Hui Xue, Fan Yang

TL;DR

This paper revives the DSP framework as DSP+ by introducing fine-grained neuro-symbolic enhancements across drafting, sketching, and proving phases. It demonstrates that, without additional model training, DSP+ achieves competitive accuracy on miniF2F, ProofNet, and PutnamBench, and even solves an IMO problem, while revealing formalization errors in existing datasets. Through ablations, ensemble configurations, and robust analysis, the work shows how coordinated reasoning models, step provers, and symbolic search can rival RL-trained theorem provers with improved token efficiency. The approach is implemented in Lean 4 with open sourcing, highlighting a practical, complementary route to advanced theorem proving beyond RL-scale training.

Abstract

Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces \textbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a \emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sketch lines containing syntactic errors are masked according to predefined rules; (3) In the proving phase, we tightly integrate symbolic search methods like Aesop with step provers to establish proofs for the sketch subgoals. Experimental results show that, without any additional model training or fine-tuning, DSP+ solves 80.7\%, 32.8\%, and 24 out of 644 problems from miniF2F, ProofNet, and PutnamBench, respectively, while requiring fewer budgets compared to state-of-the-arts. DSP+ proves \texttt{imo\_2019\_p1}, an IMO problem in miniF2F that is not solved by any prior work. Additionally, DSP+ generates proof patterns comprehensible by human experts, facilitating the identification of formalization errors; For example, eight wrongly formalized statements in miniF2F are discovered. Our results highlight the potential of classical reasoning patterns besides the RL-based training. All components will be open-sourced.

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

TL;DR

This paper revives the DSP framework as DSP+ by introducing fine-grained neuro-symbolic enhancements across drafting, sketching, and proving phases. It demonstrates that, without additional model training, DSP+ achieves competitive accuracy on miniF2F, ProofNet, and PutnamBench, and even solves an IMO problem, while revealing formalization errors in existing datasets. Through ablations, ensemble configurations, and robust analysis, the work shows how coordinated reasoning models, step provers, and symbolic search can rival RL-trained theorem provers with improved token efficiency. The approach is implemented in Lean 4 with open sourcing, highlighting a practical, complementary route to advanced theorem proving beyond RL-scale training.

Abstract

Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces \textbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a \emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sketch lines containing syntactic errors are masked according to predefined rules; (3) In the proving phase, we tightly integrate symbolic search methods like Aesop with step provers to establish proofs for the sketch subgoals. Experimental results show that, without any additional model training or fine-tuning, DSP+ solves 80.7\%, 32.8\%, and 24 out of 644 problems from miniF2F, ProofNet, and PutnamBench, respectively, while requiring fewer budgets compared to state-of-the-arts. DSP+ proves \texttt{imo\_2019\_p1}, an IMO problem in miniF2F that is not solved by any prior work. Additionally, DSP+ generates proof patterns comprehensible by human experts, facilitating the identification of formalization errors; For example, eight wrongly formalized statements in miniF2F are discovered. Our results highlight the potential of classical reasoning patterns besides the RL-based training. All components will be open-sourced.

Paper Structure

This paper contains 41 sections, 9 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The best achieved results and the inference tokens used of top solutions on miniF2F-test. DSP+, with inference only, can achieve comparable accuracy using fewer tokens.
  • Figure 2: The workflow of DSP+ consists of three phases. Every phase is enhanced with neural models and symbolic rules in consideration of itself and other phases.
  • Figure 3: Ablation of DSP+ components, which shows the synergy in DSP+.
  • Figure 4: Ablation of Draft Formats. Free formatting is better as more informative.
  • Figure 5: Ablation of Sketch Optimizations. Both optimizations are effective.
  • ...and 5 more figures