Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs
Garett Cunningham, Razvan C. Bunescu, David Juedes
TL;DR
This work tackles the problem of verifying proofs by translating natural-language proofs into formal Coq representations (autoformalization). It presents a semantic parsing pipeline based on the Universal Transformer with a copying mechanism to map LaTeX statements and proofs, including Hoare-based code correctness arguments, to Coq. The authors generate grammar-driven datasets for arithmetic theorems and code correctness, supplemented by handwritten samples, and evaluate generalization to intermediate lengths and unseen NL; results show meaningful progress and highlight challenges with out-of-vocabulary tokens and longer inputs. The study demonstrates the feasibility of autoformalization in restricted domains and suggests potential educational and software-verification applications, while outlining paths for scaling to broader mathematics and programming languages.
Abstract
The ever-growing complexity of mathematical proofs makes their manual verification by mathematicians very cognitively demanding. Autoformalization seeks to address this by translating proofs written in natural language into a formal representation that is computer-verifiable via interactive theorem provers. In this paper, we introduce a semantic parsing approach, based on the Universal Transformer architecture, that translates elementary mathematical proofs into an equivalent formalization in the language of the Coq interactive theorem prover. The same architecture is also trained to translate simple imperative code decorated with Hoare triples into formally verifiable proofs of correctness in Coq. Experiments on a limited domain of artificial and human-written proofs show that the models generalize well to intermediate lengths not seen during training and variations in natural language.
