Table of Contents
Fetching ...

Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs

Garett Cunningham, Razvan C. Bunescu, David Juedes

TL;DR

This work tackles the problem of verifying proofs by translating natural-language proofs into formal Coq representations (autoformalization). It presents a semantic parsing pipeline based on the Universal Transformer with a copying mechanism to map LaTeX statements and proofs, including Hoare-based code correctness arguments, to Coq. The authors generate grammar-driven datasets for arithmetic theorems and code correctness, supplemented by handwritten samples, and evaluate generalization to intermediate lengths and unseen NL; results show meaningful progress and highlight challenges with out-of-vocabulary tokens and longer inputs. The study demonstrates the feasibility of autoformalization in restricted domains and suggests potential educational and software-verification applications, while outlining paths for scaling to broader mathematics and programming languages.

Abstract

The ever-growing complexity of mathematical proofs makes their manual verification by mathematicians very cognitively demanding. Autoformalization seeks to address this by translating proofs written in natural language into a formal representation that is computer-verifiable via interactive theorem provers. In this paper, we introduce a semantic parsing approach, based on the Universal Transformer architecture, that translates elementary mathematical proofs into an equivalent formalization in the language of the Coq interactive theorem prover. The same architecture is also trained to translate simple imperative code decorated with Hoare triples into formally verifiable proofs of correctness in Coq. Experiments on a limited domain of artificial and human-written proofs show that the models generalize well to intermediate lengths not seen during training and variations in natural language.

Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs

TL;DR

This work tackles the problem of verifying proofs by translating natural-language proofs into formal Coq representations (autoformalization). It presents a semantic parsing pipeline based on the Universal Transformer with a copying mechanism to map LaTeX statements and proofs, including Hoare-based code correctness arguments, to Coq. The authors generate grammar-driven datasets for arithmetic theorems and code correctness, supplemented by handwritten samples, and evaluate generalization to intermediate lengths and unseen NL; results show meaningful progress and highlight challenges with out-of-vocabulary tokens and longer inputs. The study demonstrates the feasibility of autoformalization in restricted domains and suggests potential educational and software-verification applications, while outlining paths for scaling to broader mathematics and programming languages.

Abstract

The ever-growing complexity of mathematical proofs makes their manual verification by mathematicians very cognitively demanding. Autoformalization seeks to address this by translating proofs written in natural language into a formal representation that is computer-verifiable via interactive theorem provers. In this paper, we introduce a semantic parsing approach, based on the Universal Transformer architecture, that translates elementary mathematical proofs into an equivalent formalization in the language of the Coq interactive theorem prover. The same architecture is also trained to translate simple imperative code decorated with Hoare triples into formally verifiable proofs of correctness in Coq. Experiments on a limited domain of artificial and human-written proofs show that the models generalize well to intermediate lengths not seen during training and variations in natural language.
Paper Structure (13 sections, 1 equation, 5 figures, 1 table)

This paper contains 13 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Generated example from the even-odd set.
  • Figure 2: Instance of sublemma use in the even-odd dataset. The proof that the sum of non-constant terms is even (assertion ) is given before proving the theorem.
  • Figure 3: Generated composites example.
  • Figure 4: Generated example from the powers set.
  • Figure 5: Generated poly example: [Left] the Hoare logic proof; [Right] the code correctness proof in Coq.