Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

Xingguang Ji; Yahui Liu; Qi Wang; Jingyuan Zhang; Yang Yue; Rui Shi; Chenxi Sun; Fuzheng Zhang; Guorui Zhou; Kun Gai

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, Kun Gai

TL;DR

This work tackles the challenge of enabling small LLMs to generate formal proofs by tightly integrating a Lean 4 verifier into a multi-turn reasoning loop. The authors introduce Leanabell-Prover-V2, a 7B model that employs verifier-guided long chain-of-thought and reinforcement learning (via DAPO) to refine proof steps, with feedback from the Lean 4 verifier guiding rewrites and policy updates. Through cold-start data design, verifier-aware RL, and data from external formalization tools, the approach achieves state-of-the-art-like performance on MiniF2F among 7B models and shows notable gains on ProofNet and ProverBench benchmarks, including AIME-style problems. The results demonstrate improved sample efficiency and the model’s capacity to reflect on and correct its reasoning in the presence of verifier feedback, suggesting practical pathways for scalable, reliable formal reasoning in smaller LLMs.

Abstract

We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2 improves performance by 3.2% (pass@128) with Kimina-Prover-Preview-Distill-7B and 2.0% (pass@128) with DeepSeek-Prover-V2-7B on the MiniF2F test set. The source codes, curated data and models are available at: https://github.com/Leanabell-LM/Leanabell-Prover-V2.

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

TL;DR

Abstract

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)