Table of Contents
Fetching ...

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Xueliang Zhao, Lin Zheng, Haige Bo, Changran Hu, Urmish Thakker, Lingpeng Kong

TL;DR

This work tackles data scarcity and multi-step reasoning in formal theorem proving by introducing SubgoalXL, a framework that fuses subgoal-based proofs with an expert-learning loop in the Isabelle environment. It jointly optimizes a formal-proof generator, a formal-statement generator, and a subgoal generator (along with a posterior subgoal model) through iterative distribution refinement and gradient-based rewards to maximize the utility of limited data. Empirically, SubgoalXL achieves a new state-of-the-art of $56.1\%$ on miniF2F in Isabelle, improving by $4.9\%$ absolute, and solves 41 AMC12, 9 AIME, and 3 IMO problems, while ablations confirm the critical role of subgoal-based supervision and iterative refinement. The approach demonstrates that targeted guidance and data-efficient supervision can significantly boost LLM-based formal theorem proving, offering practical implications for AI-assisted mathematical reasoning and automated verification. The authors provide code at the project URL to promote reproducibility and further development.

Abstract

Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical challenges: the scarcity of specialized mathematics and theorem-proving data, and the need for improved multi-step reasoning abilities in LLMs. By optimizing data efficiency and employing subgoal-level supervision, SubgoalXL extracts richer information from limited human-generated proofs. The framework integrates subgoal-oriented proof strategies with an expert learning system, iteratively refining formal statement, proof, and subgoal generators. Leveraging the Isabelle environment's advantages in subgoal-based proofs, SubgoalXL achieves a new state-of-the-art performance of 56.1\% in Isabelle on the standard miniF2F dataset, marking an absolute improvement of 4.9\%. Notably, SubgoalXL successfully solves 41 AMC12, 9 AIME, and 3 IMO problems from miniF2F. These results underscore the effectiveness of maximizing limited data utility and employing targeted guidance for complex reasoning in formal theorem proving, contributing to the ongoing advancement of AI reasoning capabilities. The implementation is available at \url{https://github.com/zhaoxlpku/SubgoalXL}.

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

TL;DR

This work tackles data scarcity and multi-step reasoning in formal theorem proving by introducing SubgoalXL, a framework that fuses subgoal-based proofs with an expert-learning loop in the Isabelle environment. It jointly optimizes a formal-proof generator, a formal-statement generator, and a subgoal generator (along with a posterior subgoal model) through iterative distribution refinement and gradient-based rewards to maximize the utility of limited data. Empirically, SubgoalXL achieves a new state-of-the-art of on miniF2F in Isabelle, improving by absolute, and solves 41 AMC12, 9 AIME, and 3 IMO problems, while ablations confirm the critical role of subgoal-based supervision and iterative refinement. The approach demonstrates that targeted guidance and data-efficient supervision can significantly boost LLM-based formal theorem proving, offering practical implications for AI-assisted mathematical reasoning and automated verification. The authors provide code at the project URL to promote reproducibility and further development.

Abstract

Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical challenges: the scarcity of specialized mathematics and theorem-proving data, and the need for improved multi-step reasoning abilities in LLMs. By optimizing data efficiency and employing subgoal-level supervision, SubgoalXL extracts richer information from limited human-generated proofs. The framework integrates subgoal-oriented proof strategies with an expert learning system, iteratively refining formal statement, proof, and subgoal generators. Leveraging the Isabelle environment's advantages in subgoal-based proofs, SubgoalXL achieves a new state-of-the-art performance of 56.1\% in Isabelle on the standard miniF2F dataset, marking an absolute improvement of 4.9\%. Notably, SubgoalXL successfully solves 41 AMC12, 9 AIME, and 3 IMO problems from miniF2F. These results underscore the effectiveness of maximizing limited data utility and employing targeted guidance for complex reasoning in formal theorem proving, contributing to the ongoing advancement of AI reasoning capabilities. The implementation is available at \url{https://github.com/zhaoxlpku/SubgoalXL}.
Paper Structure (37 sections, 4 equations, 14 figures, 4 tables)

This paper contains 37 sections, 4 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Left: Examples of informal statement, informal proof, formal statement, formal proof, and subgoal-based proof. Right: Overview of the subgoal-based expert learning framework. Abbreviations: "Stat." for "Statement", "F." for "Formal", and "P." for "Posterior". Each iteration samples subgoal-based proofs, formal statements, and formal proofs from their optimal distributions.
  • Figure 2: Pass-rate comparisons across different iterations on the miniF2F dataset.
  • Figure 3: Synthetic proof pass-rate over iterations.
  • Figure 4: Counts of Different Error Types
  • Figure 5: Case study comparing subgoal-based and informal proofs. The left example shows a successful attempt using subgoal-based proofs, while the right examples depict failed attempts with informal proofs.
  • ...and 9 more figures