Table of Contents
Fetching ...

A New Approach Towards Autoformalization

Nilay Patel, Rahul Saha, Jeffrey Flanigan

TL;DR

This work addresses the difficulty of autoformalizing real-world mathematics by decomposing the task into four subtasks: formalizing theorems and definitions in an unlinked form, linking entities to a formal library, and adjusting types to satisfy a proof assistant's checker. It introduces arXiv2Formal, a beta 50-theorem Lean 3 benchmark, to enable supervised progress on subtask 1 and to foster community contributions. Early GPT-3.5 experiments show that in-context learning can improve translations but reveal persistent issues with typing and context that the proposed linking and type-adjustment steps aim to remedy. By lowering complexity and data requirements, this staged approach aims to scale autoformalization toward real-world, research-level mathematics and broader proof ecosystems, with extensions to Lean 4 on the horizon.

Abstract

Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.

A New Approach Towards Autoformalization

TL;DR

This work addresses the difficulty of autoformalizing real-world mathematics by decomposing the task into four subtasks: formalizing theorems and definitions in an unlinked form, linking entities to a formal library, and adjusting types to satisfy a proof assistant's checker. It introduces arXiv2Formal, a beta 50-theorem Lean 3 benchmark, to enable supervised progress on subtask 1 and to foster community contributions. Early GPT-3.5 experiments show that in-context learning can improve translations but reveal persistent issues with typing and context that the proposed linking and type-adjustment steps aim to remedy. By lowering complexity and data requirements, this staged approach aims to scale autoformalization toward real-world, research-level mathematics and broader proof ecosystems, with extensions to Lean 4 on the horizon.

Abstract

Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.
Paper Structure (8 sections, 4 figures, 1 table)

This paper contains 8 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: An example of our proposed autoformalization process, starting from LaTeX theorems (top left red) and definition (top right red) and ending with formalized Lean code (bottom). In step 1 (top left), despite lacking context for the set $X$, the definitions of flubs and glibs, and the function $f$, we are still able to make a plausible translation which we can fill in later. In step 2 (top right), we formalize definitions and determine the correct types of objects. Note that the translations are still not correct, since the type of X hasn't been applied to the other definitions or theorems yet. In step 3 (bottom), we fix this by linking the referenced name X with the generated type X, and we also align the names of other mismatched functions.
  • Figure 2: Distribution of topics of formalized theorems in arXiv2Formal.
  • Figure 3: An example pair from an arXiv paper (arxiv-example-0705.1690) from the benchmark dataset and its formalized versions. Note that we don't know the definitions of semi-Brjuno and Brjuno, but we are still able to formalize the theorem with placeholders.
  • Figure 4: An example pair from an arXiv paper (arxiv-example-0705.1690) from the benchmark dataset and its formalized versions. In this more complicated case, GPT is still able to correctly translate several pieces of the theorem. One mistake is the fact that $f$ should have a range of $\mathbb{C}$ rather than $\mathbb{R}$, but this is a mistake that will be corrected in stage 4 (entity linking). The use of $\|\ldots\|_2$ instead of our preferred norm_l2 is due to the prevalence of this notation in the pretraining corpus.