Table of Contents
Fetching ...

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

Jialiang Sun, Yuzhi Tang, Ao Li, Chris J. Maddison, Kuldeep S. Meel

TL;DR

The Enumerate-Conjecture-Prove (ECP) framework is introduced, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving with formal verification in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions.

Abstract

Mathematical reasoning is central to artificial intelligence, with applications in education, code generation, and research-level mathematical discovery. Mathematical competitions highlight two problem types: theorem proving, requiring rigorous proofs, and answer construction, requiring creative generation and formal verification of mathematical objects. Existing research reveals that LLMs can tackle difficult answer-construction tasks but are prone to errors from hallucinations and unverifiable steps, while symbolic methods guarantee rigor but falter in creative answer construction. This raises a key understudied question: how to solve answer-construction problems while preserving both LLM creativity and mathematical rigor? To address this problem, we introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions. ECP is model agnostic and shows consistent improvements over pure LLM baselines: on the subset of PutnamBench for answer construction, ECP formally solves 6 out of 337 answer-construction problems end to end (up from 4 without ECP) using GPT-5 mini and DeepSeek-Prover-V2-7B. On ConstructiveBench, ECP achieves 33.1% end-to-end state-of-the-art accuracy (up from 32.5%), demonstrating its potential to advance formal mathematical reasoning by combining LLM conjecturing with formal verification. Our code and dataset are publicly available at GitHub (https://github.com/sunjia72/ECP) and Hugging Face (https://huggingface.co/datasets/sunjia72/ConstructiveBench).

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

TL;DR

The Enumerate-Conjecture-Prove (ECP) framework is introduced, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving with formal verification in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions.

Abstract

Mathematical reasoning is central to artificial intelligence, with applications in education, code generation, and research-level mathematical discovery. Mathematical competitions highlight two problem types: theorem proving, requiring rigorous proofs, and answer construction, requiring creative generation and formal verification of mathematical objects. Existing research reveals that LLMs can tackle difficult answer-construction tasks but are prone to errors from hallucinations and unverifiable steps, while symbolic methods guarantee rigor but falter in creative answer construction. This raises a key understudied question: how to solve answer-construction problems while preserving both LLM creativity and mathematical rigor? To address this problem, we introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions. ECP is model agnostic and shows consistent improvements over pure LLM baselines: on the subset of PutnamBench for answer construction, ECP formally solves 6 out of 337 answer-construction problems end to end (up from 4 without ECP) using GPT-5 mini and DeepSeek-Prover-V2-7B. On ConstructiveBench, ECP achieves 33.1% end-to-end state-of-the-art accuracy (up from 32.5%), demonstrating its potential to advance formal mathematical reasoning by combining LLM conjecturing with formal verification. Our code and dataset are publicly available at GitHub (https://github.com/sunjia72/ECP) and Hugging Face (https://huggingface.co/datasets/sunjia72/ConstructiveBench).

Paper Structure

This paper contains 44 sections, 6 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Illustration of the ECP framework applied to a formalized Balkan MO Shortlist problem in OmniMath dataset gao2024omni.
  • Figure 2: Problem sources and domains in ConstructiveBench. Only the top 11 sources and top 7 domains are shown; the remainder are grouped under 'Other'.
  • Figure 3: Overview of the autoformalization pipeline: the LLM drafts a formal statement; when compilation or semantic checks fail, error messages and retrieved Lean references are fed back; the loop repeats until both checks pass.
  • Figure 4: Left: Distribution of answer types in ConstructiveBench (top 8 shown; the remainder are grouped as 'Other'). Right: Example dataset entry.