Table of Contents
Fetching ...

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, Thomas Hanwen Zhu

TL;DR

Seed-Prover 1.5 targets the gap between natural-language reasoning and formal Lean proofs by training an agentic prover with large-scale reinforcement learning, complemented by a sketch model that translates NL proofs into Lean sketches and a hierarchical test-time workflow that coordinates NL proving, sketching, and lemma verification. The approach yields strong performance across undergraduate and graduate benchmarks, solving 88% of PutnamBench, 80% of Fate-H, 33% of Fate-X, and 11/12 Putnam 2025 problems within a few hours under modest compute. The work demonstrates that scaling learning from experience with high-quality formal feedback can significantly improve formal mathematical reasoning efficiency and capability, and it highlights a viable path toward bridging natural and formal languages in mathematical proof systems.

Abstract

Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

TL;DR

Seed-Prover 1.5 targets the gap between natural-language reasoning and formal Lean proofs by training an agentic prover with large-scale reinforcement learning, complemented by a sketch model that translates NL proofs into Lean sketches and a hierarchical test-time workflow that coordinates NL proving, sketching, and lemma verification. The approach yields strong performance across undergraduate and graduate benchmarks, solving 88% of PutnamBench, 80% of Fate-H, 33% of Fate-X, and 11/12 Putnam 2025 problems within a few hours under modest compute. The work demonstrates that scaling learning from experience with high-quality formal feedback can significantly improve formal mathematical reasoning efficiency and capability, and it highlights a viable path toward bridging natural and formal languages in mathematical proof systems.

Abstract

Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Paper Structure

This paper contains 24 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Performance of Seed-Prover 1.5 compare to other state-of-the-art provers.
  • Figure 2: An example of Seed-Prover 1.5 agentic prover running on FATE-H problem.
  • Figure 3: Training dynamics and evaluation metrics over 1200 RL training steps.
  • Figure 4: Other training metrics in our agentic RL training.
  • Figure 5: Distribution of search calls for proved samples using different checkpoints. The charts illustrate the percentage of successfully proved samples stratified by the number of search calls ($\le 10$, $11-20$, and $>20$) for the (a) Putnam and (b) Fate-H benchmarks. The number of search tool calls per sample here include all trajectories (i.e., summary-based trajectory)
  • ...and 2 more figures