Table of Contents
Fetching ...

Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method

Rahul Vishwakarma, Subhankar Mishra

TL;DR

DS-Prover introduces a dynamic sampling strategy for neural theorem proving that adjusts the number of tactics applied per step based on the remaining time, improving proof search efficiency. It pairs this with data augmentation that decomposes multi-premise tactics into single-premise variants to enhance premise-based tactic learning, trained with ByT5-Small. Evaluations on Lean/Mathlib data show state-of-the-art $Pass@1$ on ProofNet (14.2%) and strong MiniF2F results (29.8%), with robust cumulative performance, and the dynamic approach consistently outperforms fixed sampling. Additional contributions include LeanDojo optimizations and a public Lean3 proving website, collectively advancing practical neural theorem proving.

Abstract

Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work together to complete the proof. In this paper, we introduce DS-Prover, a novel dynamic sampling method for theorem proving. This method dynamically determines the number of tactics to apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem. This makes the proof search process more efficient by adjusting the balance between exploration and exploitation as time passes. We also augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises. This gives the model more examples to learn from and helps it to predict the tactics with premises more accurately. We perform our experiments using the Mathlib dataset of the Lean theorem prover and report the performance on two standard datasets, MiniF2F and ProofNet. Our methods achieve significant performance gains on both datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the best-reported Pass@1 of 29.6% using Lean.

Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method

TL;DR

DS-Prover introduces a dynamic sampling strategy for neural theorem proving that adjusts the number of tactics applied per step based on the remaining time, improving proof search efficiency. It pairs this with data augmentation that decomposes multi-premise tactics into single-premise variants to enhance premise-based tactic learning, trained with ByT5-Small. Evaluations on Lean/Mathlib data show state-of-the-art on ProofNet (14.2%) and strong MiniF2F results (29.8%), with robust cumulative performance, and the dynamic approach consistently outperforms fixed sampling. Additional contributions include LeanDojo optimizations and a public Lean3 proving website, collectively advancing practical neural theorem proving.

Abstract

Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work together to complete the proof. In this paper, we introduce DS-Prover, a novel dynamic sampling method for theorem proving. This method dynamically determines the number of tactics to apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem. This makes the proof search process more efficient by adjusting the balance between exploration and exploitation as time passes. We also augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises. This gives the model more examples to learn from and helps it to predict the tactics with premises more accurately. We perform our experiments using the Mathlib dataset of the Lean theorem prover and report the performance on two standard datasets, MiniF2F and ProofNet. Our methods achieve significant performance gains on both datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the best-reported Pass@1 of 29.6% using Lean.
Paper Structure (25 sections, 1 equation, 16 figures, 4 tables)

This paper contains 25 sections, 1 equation, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Navigating the Proof Search Tree with Dynamic Sampling: Deciding the number of tactics (n) to sample as time (t) progresses in the proof search, guided by Equation \ref{['eq:decay']}.
  • Figure 2: Venn diagram illustrating subsets of theorems proved from all the test theorems in Mathlib, Minif2f, and Proofnet using different proving methods.
  • Figure 3: Comparison of Theorems Proved Using Fixed and Dynamic Sampling Methods on 500 Theorems of Mathlib's Test Data with Model Trained on Augmented Dataset.
  • Figure 4: Illustration of the Website Successfully Proving a Theorem.
  • Figure 5: Venn diagram illustrating subsets of theorems proved from the test theorems in Mathlib dataset using different proving methods.
  • ...and 11 more figures