Table of Contents
Fetching ...

A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation

Beibei Xiong, Hangyu Lv, Haojia Shan, Jianlin Wang, Zhengfeng Yang, Lihong Zhi

TL;DR

This paper addresses data scarcity in automated theorem proving for combinatorial identities by introducing LeanComb, the first formal Lean 4 benchmark for this domain, and augmenting it with ATG4CI, a framework that fuses self-improving LLMs with reinforcement-learning tree search to generate theorems and complete proofs. The authors build Lean4Kit to extract state-tactic data from Lean proofs, construct partial proof paths, and iteratively generate and validate new theorems, culminating in the LeanComb-Enhanced dataset with 260,466 theorems. Empirical results show substantial improvements in proof success across multiple LLMs on LeanComb and miniF2F after training on the augmented data, with pass@1 gains up to 26% on LeanComb and notable gains on miniF2F. These contributions advance automated reasoning in combinatorics by providing a scalable data-generating pipeline and a rigorous benchmark for evaluating ATP systems in a challenging mathematical subfield.

Abstract

Large language models (LLMs) have significantly advanced formal theorem proving, yet the scarcity of high-quality training data constrains their capabilities in complex mathematical domains. Combinatorics, a cornerstone of mathematics, provides essential tools for analyzing discrete structures and solving optimization problems. However, its inherent complexity makes it particularly challenging for automated theorem proving (ATP) for combinatorial identities. To address this, we manually construct LeanComb, combinatorial identities benchmark in Lean, which is, to our knowledge, the first formalized theorem proving benchmark built for combinatorial identities. We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for tactic prediction. By utilizing ATG4CI, we generate a LeanComb-Enhanced dataset comprising 260K combinatorial identities theorems, each with a complete formal proof in Lean, and experimental evaluations demonstrate that models trained on this dataset can generate more effective tactics, thereby improving success rates in automated theorem proving for combinatorial identities.

A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation

TL;DR

This paper addresses data scarcity in automated theorem proving for combinatorial identities by introducing LeanComb, the first formal Lean 4 benchmark for this domain, and augmenting it with ATG4CI, a framework that fuses self-improving LLMs with reinforcement-learning tree search to generate theorems and complete proofs. The authors build Lean4Kit to extract state-tactic data from Lean proofs, construct partial proof paths, and iteratively generate and validate new theorems, culminating in the LeanComb-Enhanced dataset with 260,466 theorems. Empirical results show substantial improvements in proof success across multiple LLMs on LeanComb and miniF2F after training on the augmented data, with pass@1 gains up to 26% on LeanComb and notable gains on miniF2F. These contributions advance automated reasoning in combinatorics by providing a scalable data-generating pipeline and a rigorous benchmark for evaluating ATP systems in a challenging mathematical subfield.

Abstract

Large language models (LLMs) have significantly advanced formal theorem proving, yet the scarcity of high-quality training data constrains their capabilities in complex mathematical domains. Combinatorics, a cornerstone of mathematics, provides essential tools for analyzing discrete structures and solving optimization problems. However, its inherent complexity makes it particularly challenging for automated theorem proving (ATP) for combinatorial identities. To address this, we manually construct LeanComb, combinatorial identities benchmark in Lean, which is, to our knowledge, the first formalized theorem proving benchmark built for combinatorial identities. We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for tactic prediction. By utilizing ATG4CI, we generate a LeanComb-Enhanced dataset comprising 260K combinatorial identities theorems, each with a complete formal proof in Lean, and experimental evaluations demonstrate that models trained on this dataset can generate more effective tactics, thereby improving success rates in automated theorem proving for combinatorial identities.

Paper Structure

This paper contains 28 sections, 6 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: The Framework of ATG4CI.
  • Figure 2: Distribution of Theorem Numbers on Prediction Steps
  • Figure 3: Theorem count distribution across proof steps in LeanComb benchmark.
  • Figure 4: Theorem count distribution across proof steps in LeanComb-Enhanced dataset.
  • Figure 5: Theorem distribution by proof Steps
  • ...and 2 more figures