Table of Contents
Fetching ...

Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation

Shaonan Wu, Shuai Lu, Yeyun Gong, Nan Duan, Ping Wei

TL;DR

Alchemy addresses data scarcity in neural theorem proving by synthesizing theorems in Lean's symbolic space through targeted mutations of Mathlib statements. The method identifies invocable theorems via Lean tactics, mutates candidate statements, constructs proofs using the have tactic, and verifies them in Lean, creating an expansive augmented corpus. This corpus is used for continual pretraining and supervised finetuning, yielding measurable improvements on Leandojo novel_premises and out-of-distribution miniF2F benchmarks. The work provides detailed analysis of synthetic data composition, training paradigms, and practical guidance for building stronger theorem provers, with open-source tooling to enable further research.

Abstract

Formal proofs are challenging to write even for experienced experts. Recent progress in Neural Theorem Proving (NTP) shows promise in expediting this process. However, the formal corpora available on the Internet are limited compared to the general text, posing a significant data scarcity challenge for NTP. To address this issue, this work proposes Alchemy, a general framework for data synthesis that constructs formal theorems through symbolic mutation. Specifically, for each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. Subsequently, we mutate the candidate theorem by replacing the corresponding term in the statement with its equivalent form or antecedent. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore, we perform continual pretraining and supervised finetuning on this augmented corpus for large language models. Experimental results demonstrate the effectiveness of our approach, achieving a 4.70% absolute performance improvement on Leandojo benchmark. Additionally, our approach achieves a 2.47% absolute performance gain on the out-of-distribution miniF2F benchmark based on the synthetic data.To provide further insights, we conduct a comprehensive analysis of synthetic data composition and the training paradigm, offering valuable guidance for developing a strong theorem prover.

Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation

TL;DR

Alchemy addresses data scarcity in neural theorem proving by synthesizing theorems in Lean's symbolic space through targeted mutations of Mathlib statements. The method identifies invocable theorems via Lean tactics, mutates candidate statements, constructs proofs using the have tactic, and verifies them in Lean, creating an expansive augmented corpus. This corpus is used for continual pretraining and supervised finetuning, yielding measurable improvements on Leandojo novel_premises and out-of-distribution miniF2F benchmarks. The work provides detailed analysis of synthetic data composition, training paradigms, and practical guidance for building stronger theorem provers, with open-source tooling to enable further research.

Abstract

Formal proofs are challenging to write even for experienced experts. Recent progress in Neural Theorem Proving (NTP) shows promise in expediting this process. However, the formal corpora available on the Internet are limited compared to the general text, posing a significant data scarcity challenge for NTP. To address this issue, this work proposes Alchemy, a general framework for data synthesis that constructs formal theorems through symbolic mutation. Specifically, for each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. Subsequently, we mutate the candidate theorem by replacing the corresponding term in the statement with its equivalent form or antecedent. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore, we perform continual pretraining and supervised finetuning on this augmented corpus for large language models. Experimental results demonstrate the effectiveness of our approach, achieving a 4.70% absolute performance improvement on Leandojo benchmark. Additionally, our approach achieves a 2.47% absolute performance gain on the out-of-distribution miniF2F benchmark based on the synthetic data.To provide further insights, we conduct a comprehensive analysis of synthetic data composition and the training paradigm, offering valuable guidance for developing a strong theorem prover.

Paper Structure

This paper contains 38 sections, 15 figures, 7 tables, 3 algorithms.

Figures (15)

  • Figure 1: The overview of our synthesis pipeline. At the theorem level, we find invocable theorems that can be used to rewrite or apply to the assumptions or assertion of the candidate statement, such as the iff and implication rules about the Coprime. Then, we construct the new statements by replacing the specific component with its equivalent form or antecedent. At the proof tree level, our method merges two existing proof trees.
  • Figure 2: Prompt template
  • Figure 3: Distribution of mathematical subjects. For each employed tactic, we mix the generated variants with the original theorems. a) The distribution of Mathlib. b) The distribution of Mathlib + rw. c) The distribution of Mathlib + apply.
  • Figure 4: Influence of the quantity of synthesized data points.
  • Figure 5: Examples of invocable theorems for apply
  • ...and 10 more figures