Table of Contents
Fetching ...

A Survey on Deep Learning for Theorem Proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

TL;DR

This survey critically maps the landscape of deep learning for theorem proving, covering five DL-enabled tasks, diverse datasets, and evaluation paradigms across informal and formal proving settings. It highlights how autoformalization, premise selection, proofstep generation, and proof search integrate with ATP/ITP systems, aided by modern LLMs, GNNs, and structured prompting. Key contributions include a comprehensive catalog of datasets and generation strategies, state-of-the-art performance summaries, and a candid discussion of challenges such as data scarcity and evaluation fragmentation, along with forward-looking directions like conjecturing and verified code generation. The work serves as a foundational reference for researchers aiming to advance AI-assisted formal reasoning and its practical deployment in mathematics and software verification.

Abstract

Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in natural language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a comprehensive survey of deep learning for theorem proving by offering (i) a thorough review of existing approaches across various tasks such as autoformalization, premise selection, proofstep generation, and proof search; (ii) an extensive summary of curated datasets and strategies for synthetic data generation; (iii) a detailed analysis of evaluation metrics and the performance of state-of-the-art methods; and (iv) a critical discussion on the persistent challenges and the promising avenues for future exploration. Our survey aims to serve as a foundational reference for deep learning approaches in theorem proving, inspiring and catalyzing further research endeavors in this rapidly growing field. A curated list of papers is available at https://github.com/zhaoyu-li/DL4TP.

A Survey on Deep Learning for Theorem Proving

TL;DR

This survey critically maps the landscape of deep learning for theorem proving, covering five DL-enabled tasks, diverse datasets, and evaluation paradigms across informal and formal proving settings. It highlights how autoformalization, premise selection, proofstep generation, and proof search integrate with ATP/ITP systems, aided by modern LLMs, GNNs, and structured prompting. Key contributions include a comprehensive catalog of datasets and generation strategies, state-of-the-art performance summaries, and a candid discussion of challenges such as data scarcity and evaluation fragmentation, along with forward-looking directions like conjecturing and verified code generation. The work serves as a foundational reference for researchers aiming to advance AI-assisted formal reasoning and its practical deployment in mathematics and software verification.

Abstract

Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in natural language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a comprehensive survey of deep learning for theorem proving by offering (i) a thorough review of existing approaches across various tasks such as autoformalization, premise selection, proofstep generation, and proof search; (ii) an extensive summary of curated datasets and strategies for synthetic data generation; (iii) a detailed analysis of evaluation metrics and the performance of state-of-the-art methods; and (iv) a critical discussion on the persistent challenges and the promising avenues for future exploration. Our survey aims to serve as a foundational reference for deep learning approaches in theorem proving, inspiring and catalyzing further research endeavors in this rapidly growing field. A curated list of papers is available at https://github.com/zhaoyu-li/DL4TP.
Paper Structure (18 sections, 3 figures, 2 tables)

This paper contains 18 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Papers on deep learning for theorem proving over the years (data for 2024 is up to July).
  • Figure 2: Top: The informal statement and proof of the Fundamental Theorem of Arithmetic in https://proofwiki.org/wiki/Fundamental_Theorem_of_Arithmetic. Bottom Left: The formal statement and proof of the same theorem in the mathlib library mathlib of Lean 4. Bottom Right: The corresponding proof tree illustrating the formal proof process in Lean 4. Only changes in the local context of each node are marked for clarity. The references and premises used in the informal and formal proof are highlighted by underlines and colors respectively.
  • Figure 3: The taxonomy of datasets in theorem proving.