Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

Jialiang Sun; Yuzhi Tang; Ao Li; Chris J. Maddison; Kuldeep S. Meel

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

Jialiang Sun, Yuzhi Tang, Ao Li, Chris J. Maddison, Kuldeep S. Meel

TL;DR

The Enumerate-Conjecture-Prove (ECP) framework is introduced, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving with formal verification in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions.

Abstract

Mathematical reasoning is central to artificial intelligence, with applications in education, code generation, and research-level mathematical discovery. Mathematical competitions highlight two problem types: theorem proving, requiring rigorous proofs, and answer construction, requiring creative generation and formal verification of mathematical objects. Existing research reveals that LLMs can tackle difficult answer-construction tasks but are prone to errors from hallucinations and unverifiable steps, while symbolic methods guarantee rigor but falter in creative answer construction. This raises a key understudied question: how to solve answer-construction problems while preserving both LLM creativity and mathematical rigor? To address this problem, we introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions. ECP is model agnostic and shows consistent improvements over pure LLM baselines: on the subset of PutnamBench for answer construction, ECP formally solves 6 out of 337 answer-construction problems end to end (up from 4 without ECP) using GPT-5 mini and DeepSeek-Prover-V2-7B. On ConstructiveBench, ECP achieves 33.1% end-to-end state-of-the-art accuracy (up from 32.5%), demonstrating its potential to advance formal mathematical reasoning by combining LLM conjecturing with formal verification. Our code and dataset are publicly available at GitHub (https://github.com/sunjia72/ECP) and Hugging Face (https://huggingface.co/datasets/sunjia72/ConstructiveBench).

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

TL;DR

Abstract

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)