Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques
Sangjun Han, Taeil Hur, Youngmi Hur, Kathy Sangkyung Lee, Myungyoon Lee, Hyojae Lim
TL;DR
The paper tackles AI-assisted formal proof generation by pairing a publicly accessible language model with a formal prover, Lean, using simple search strategies to prove problems in miniF2F. It introduces two proof-search regimes, $b$-search and $d$-search, plus a Bad($O$) feedback mechanism, and demonstrates competitive performance without any fine-tuning, including a peak of $pass@\$k\$ = 31.15\$ on miniF2F Lean when combining the two approaches. Cross-dataset and cross-LLM evaluations (ProofNet, AMC 12, and Llemma variants) show robustness, with results such as 13.14% on ProofNet and 28.28% on Llemma-based combos, indicating broader applicability beyond miniF2F. Overall, the work provides a practical, reproducible pathway toward AI-assisted formal proofs and motivates further research on integrating LLMs with formal proof systems.
Abstract
The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic searching techniques to simplify generating formal proofs, with a particular focus on the miniF2F dataset. We demonstrate how combining a large language model like ChatGPT with a formal language such as Lean, which has the added advantage of being verifiable, enhances the efficiency and accessibility of formal proof generation. Despite its simplicity, our best-performing Lean-based model surpasses all known benchmarks with a 31.15% pass rate. We extend our experiments to include other datasets and employ alternative language models, showcasing our models' comparable performance in diverse settings and allowing for a more nuanced analysis of our results. Our findings offer insights into AI-assisted formal proof generation, suggesting a promising direction for future research in formal mathematical proof.
