Table of Contents
Fetching ...

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

David Yin, Jing Gao

TL;DR

The paper addresses the scarcity of large Lean proof datasets by introducing LeanNavigator, which constructs a state-transition graph of Lean theorems and uses tactic templates with embedding-based retrieval to generate millions of new theorems with proofs. Applied to Mathlib4, this yields $4.7$ million theorems totaling $1$ billion tokens, enabling a model that outperforms the prior ReProver baseline on MIL and MiniF2F benchmarks. The work demonstrates that dataset scale is a key driver of automated theorem-proving performance and shows practical potential for scalable, formal proof generation. It also outlines an extensible framework for dataset generation via state graphs and templated tactics that could generalize to other proof assistants.

Abstract

Large Language Models (LLMs) have demonstrated significant potential in generating mathematical proofs. However, a persistent challenge is that LLMs occasionally make mistakes, while even a minor mistake can invalidate an entire proof. Proof assistants like Lean offer a great remedy. They are designed for verifying each step of a proof in a formal language, and in recent years researchers have created AI models to generate proofs in their languages. However, the scarcity of large-scale datasets of Lean proofs restrict the performance of such Automated Theorem Proving (ATP) models. We developed LeanNavigator, a novel method for generating a large-scale dataset of Lean theorems and proofs by finding new ways to prove existing Lean theorems. By leveraging an interactive Lean client and an efficient method for proof step generation, LeanNavigator efficiently produces new theorems with corresponding proofs. Applying this approach to Mathlib4, we generated 4.7 million theorems totaling 1 billion tokens, surpassing previous datasets by more than an order of magnitude. Using this extensive dataset, we trained an AI model that outperforms the state-of-the-art ReProver model in theorem-proving tasks. These results confirm our hypothesis and demonstrate the critical role of large datasets in improving the performance of automated theorem provers.

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

TL;DR

The paper addresses the scarcity of large Lean proof datasets by introducing LeanNavigator, which constructs a state-transition graph of Lean theorems and uses tactic templates with embedding-based retrieval to generate millions of new theorems with proofs. Applied to Mathlib4, this yields million theorems totaling billion tokens, enabling a model that outperforms the prior ReProver baseline on MIL and MiniF2F benchmarks. The work demonstrates that dataset scale is a key driver of automated theorem-proving performance and shows practical potential for scalable, formal proof generation. It also outlines an extensible framework for dataset generation via state graphs and templated tactics that could generalize to other proof assistants.

Abstract

Large Language Models (LLMs) have demonstrated significant potential in generating mathematical proofs. However, a persistent challenge is that LLMs occasionally make mistakes, while even a minor mistake can invalidate an entire proof. Proof assistants like Lean offer a great remedy. They are designed for verifying each step of a proof in a formal language, and in recent years researchers have created AI models to generate proofs in their languages. However, the scarcity of large-scale datasets of Lean proofs restrict the performance of such Automated Theorem Proving (ATP) models. We developed LeanNavigator, a novel method for generating a large-scale dataset of Lean theorems and proofs by finding new ways to prove existing Lean theorems. By leveraging an interactive Lean client and an efficient method for proof step generation, LeanNavigator efficiently produces new theorems with corresponding proofs. Applying this approach to Mathlib4, we generated 4.7 million theorems totaling 1 billion tokens, surpassing previous datasets by more than an order of magnitude. Using this extensive dataset, we trained an AI model that outperforms the state-of-the-art ReProver model in theorem-proving tasks. These results confirm our hypothesis and demonstrate the critical role of large datasets in improving the performance of automated theorem provers.

Paper Structure

This paper contains 15 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Lean proof of a very simple theorem "(h: s $\subseteq$ t) : s $\cap$ u $\subseteq$ t $\cap$ u". There is a hypothesis h which says "s $\subseteq$ t", and the target is to prove "s $\cap$ u $\subseteq$ t $\cap$ u". The proof contains two tactics. The first tactic "intro x xsu" introduces a new variable x and new hypothesis "xsu : x $\in$ s $\cap$ u", i.e., x belongs to the left-hand side of the target. It transforms the initial state into "h: s $\subseteq$ t, xsu : x $\in$ s $\cap$ u $\vdash$ x $\in$ t $\cap$ u". The second tactic "exact $\langle$ h xsu.1, xsu.2 $\rangle$" first applies h and xsu.1 (i.e., "s $\subseteq$ t" and "x $\in$ s") on the target, and then applies xsu.2 (i.e., "x $\in$ u") on the target, which finishes the proof.
  • Figure 2: An example Lean state graph of theorem "theorem my_mul_comm_assoc (a b c : $\mathbb{R}$) a * b * c = b * (a * c)". Please note a randomly chosen tactic is seldom applicable to a state, and therefore we need to train models to predict useful tactics.