Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Jingyuan Zhang; Qi Wang; Xingguang Ji; Yahui Liu; Yang Yue; Fuzheng Zhang; Di Zhang; Guorui Zhou; Kun Gai

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, Kun Gai

TL;DR

This work investigates posttraining scaling for formal reasoning in Lean 4 by combining large-scale continual training on hybrid statement-proof data with cognitive-behavioral data and GRPO-based reinforcement learning guided by Lean verifier rewards. It introduces Leanabell-Prover, built from SFT baselines, augmented with Lean Completion and Rewriting to instill self-reflection, and then refined via RL to optimize whole-proof generation. The approach yields state-of-the-art MiniF2F performance (59.8% pass@32) on Goedel-Prover-SFT and related baselines, demonstrating meaningful gains from data-centric and cognitive strategies, though challenges remain with weak base LLMs and RL stability. The work provides reproducible data, model checkpoints, and analysis of exploration dynamics, outlining future directions to bridge natural-language reasoning with formal proof generation. Overall, it advances the frontier of formal reasoning with LLMs, offering scalable data pipelines and RL-based improvements for Lean4 theorem proving.

Abstract

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages. To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

TL;DR

Abstract

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)