Table of Contents
Fetching ...

EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

Yuchen Tian, Ruiyuan Huang, Xuanwu Wang, Jing Ma, Zengfeng Huang, Ziyang Luo, Hongzhan Lin, Da Zheng, Lun Du

TL;DR

The paper tackles the brittleness of LLMs in formal theorem proving by introducing a symmetry- and difficulty-aware data augmentation pipeline. It combines EvolDomain (domain translation), EvolDifficulty (difficulty shaping), and EvolAST (AST-based syntactic diversification) to generate diverse, semantically equivalent formal problems, which are used to train EvolProver, a 7B non-reasoning prover. EvolProver sets new state-of-the-art results on FormalMATH-Lite and achieves competitive performance on MiniF2F, Ineq-Comp, and related benchmarks, with ablations confirming the value of each augmentation component. The work demonstrates that carefully designed data generation and verification can significantly boost generalization and robustness in formal reasoning models, even at modest model scales.

Abstract

Large Language Models (LLMs) for formal theorem proving have shown significant promise, yet they often lack generalizability and are fragile to even minor transformations of problem statements. To address this limitation, we introduce a novel data augmentation pipeline designed to enhance model robustness from two perspectives: symmetry and difficulty. From the symmetry perspective, we propose two complementary methods: EvolAST, an Abstract Syntax Tree (AST) based approach that targets syntactic symmetry to generate semantically equivalent problem variants, and EvolDomain, which leverages LLMs to address semantic symmetry by translating theorems across mathematical domains. From the difficulty perspective, we propose EvolDifficulty, which uses carefully designed evolutionary instructions to guide LLMs in generating new theorems with a wider range of difficulty. We then use the evolved data to train EvolProver, a 7B-parameter non-reasoning theorem prover. EvolProver establishes a new state-of-the-art (SOTA) on FormalMATH-Lite with a 53.8% pass@32 rate, surpassing all models of comparable size, including reasoning-based models. It also sets new SOTA records for non-reasoning models on MiniF2F-Test (69.8% pass@32), Ineq-Comp-Seed (52.2% pass@32), and Ineq-Comp-Transformed (34.0% pass@32). Ablation studies further confirm our data augmentation pipeline's effectiveness across multiple benchmarks.

EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

TL;DR

The paper tackles the brittleness of LLMs in formal theorem proving by introducing a symmetry- and difficulty-aware data augmentation pipeline. It combines EvolDomain (domain translation), EvolDifficulty (difficulty shaping), and EvolAST (AST-based syntactic diversification) to generate diverse, semantically equivalent formal problems, which are used to train EvolProver, a 7B non-reasoning prover. EvolProver sets new state-of-the-art results on FormalMATH-Lite and achieves competitive performance on MiniF2F, Ineq-Comp, and related benchmarks, with ablations confirming the value of each augmentation component. The work demonstrates that carefully designed data generation and verification can significantly boost generalization and robustness in formal reasoning models, even at modest model scales.

Abstract

Large Language Models (LLMs) for formal theorem proving have shown significant promise, yet they often lack generalizability and are fragile to even minor transformations of problem statements. To address this limitation, we introduce a novel data augmentation pipeline designed to enhance model robustness from two perspectives: symmetry and difficulty. From the symmetry perspective, we propose two complementary methods: EvolAST, an Abstract Syntax Tree (AST) based approach that targets syntactic symmetry to generate semantically equivalent problem variants, and EvolDomain, which leverages LLMs to address semantic symmetry by translating theorems across mathematical domains. From the difficulty perspective, we propose EvolDifficulty, which uses carefully designed evolutionary instructions to guide LLMs in generating new theorems with a wider range of difficulty. We then use the evolved data to train EvolProver, a 7B-parameter non-reasoning theorem prover. EvolProver establishes a new state-of-the-art (SOTA) on FormalMATH-Lite with a 53.8% pass@32 rate, surpassing all models of comparable size, including reasoning-based models. It also sets new SOTA records for non-reasoning models on MiniF2F-Test (69.8% pass@32), Ineq-Comp-Seed (52.2% pass@32), and Ineq-Comp-Transformed (34.0% pass@32). Ablation studies further confirm our data augmentation pipeline's effectiveness across multiple benchmarks.

Paper Structure

This paper contains 36 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An example of problems evolved by EvolDomain, EvolDifficulty, and EvolAST. A seed formal statement is evolved in parallel by EvolDomain and EvolDifficulty, yielding two new statements. Each of these is then further evolved by EvolAST to generate syntactic variants.
  • Figure 2: The workflow of our data augmentation pipeline comprises three phases: EvolDomain and EvolDifficulty, Verification, and EvolAST.
  • Figure 3: Comparison with SOTA models on the MiniF2F-Test dataset. Pass Rate means pass@32 success rate. Average token length is the average number of tokens generated by models across the benchmark. We categorize models as Non-CoT(non reasoning) and CoT(reasoning)
  • Figure 4: Comparison of the number of candidates passing verification for four evolution methods. Our EvolDomain & EvolDifficulty performs best.
  • Figure 5: Comparison of Mathematical Domain Distribution Before and After EvolDomain.