M2F: Automated Formalization of Mathematical Literature at Scale

Zichen Wang; Wanli Ma; Zhenyu Ming; Gong Zhang; Kun Yuan; Zaiwen Wen

M2F: Automated Formalization of Mathematical Literature at Scale

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, Zaiwen Wen

TL;DR

The paper tackles the problem of scaling automated formalization from snippets to textbook- and paper-length material by introducing M2F, a two-stage, verifier-certified framework for end-to-end formalization inside a fixed Lean environment. It combines statement compilation (Stage 1) with skeletons and placeholders and proof repair (Stage 2) with fixed signatures, guided by a VeriRefine accept/revert loop that relies exclusively on Lean toolchain feedback. The approach achieves textbook-scale formalization across 479 pages, yielding a buildable Lean library with span-level provenance and robust proof completion rates, notably 100% PSR on audited long-form material and 96% automatic PSR on the FATE-H benchmark (rising to 97% with a small lemma-map). The work demonstrates that large-scale formalization is practically feasible and provides governance features such as provenance anchors and import governance, enabling auditable, maintainable libraries. The findings underscore the value of a verifier-in-the-loop workflow for both proving and diagnosing formalization challenges, while also highlighting ongoing bottlenecks in natural-language grounding and navigation within large mathematical corpora.

Abstract

Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring that entire projects compile end-to-end. We present M2F (Math-to-Formal), the first agentic framework for end-to-end, project-scale autoformalization in Lean. The framework operates in two stages. The statement compilation stage splits the document into atomic blocks, orders them via inferred dependencies, and repairs declaration skeletons until the project compiles, allowing placeholders in proofs. The proof repair stage closes these holes under fixed signatures using goal-conditioned local edits. Throughout both stages, M2F keeps the verifier in the loop, committing edits only when toolchain feedback confirms improvement. In approximately three weeks, M2F converts long-form mathematical sources into a project-scale Lean library of 153,853 lines from 479 pages textbooks on real analysis and convex analysis, fully formalized as Lean declarations with accompanying proofs. This represents textbook-scale formalization at a pace that would typically require months or years of expert effort. On FATE-H, we achieve $96\%$ proof success (vs.\ $80\%$ for a strong baseline). Together, these results demonstrate that practical, large-scale automated formalization of mathematical literature is within reach. The full generated Lean code from our runs is available at https://github.com/optsuite/ReasBook.git.

M2F: Automated Formalization of Mathematical Literature at Scale

TL;DR

Abstract

proof success (vs.\

for a strong baseline). Together, these results demonstrate that practical, large-scale automated formalization of mathematical literature is within reach. The full generated Lean code from our runs is available at https://github.com/optsuite/ReasBook.git.

Paper Structure (51 sections, 8 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 51 sections, 8 equations, 5 figures, 13 tables, 2 algorithms.

Introduction
Related Work
Problem Setup and Pipeline Overview
Input, Environment, and Output
Two-Stage Pipeline
Method: VeriRefine for M2F
Oracles, Diagnostics, and Scopes
Objectives and Verifier-Certified Accept/Revert
Patch-Proposal Operators (Instantiation Layer)
Preprocessing: Input as Ordered JSON Items
Statement Compilation and Proof Repair
M2F: Interface Contract
Experiments
Setup
Project Artifacts and Statement Compilation
...and 36 more sections

Figures (5)

Figure 1: The M2F pipeline for project-scale automated formalization.
Figure 2: PSR on FATE-H across provers.
Figure 3: FATE-H per-problem code length (non-empty lines) and outcome category. Colors indicate outcome: green = solved automatically, yellow = solved with lemma-map supervision, red = unsolved; the single wrong-statement instance is shown with a distinct style (see §\ref{['sec:exp:failure']}).
Figure 4: System capability manifesto (workflow). A verifier-in-the-loop pipeline that turns PDF-derived structure into a buildable Lean project with (i) provenance anchoring for statement auditability, (ii) human gatekeeping at governance points, and (iii) accept/revert refinement driven solely by Lean diagnostics, yielding a queryable repository of verified declarations rather than isolated one-off proofs.
Figure 5: System capability manifesto (navigability and locality). Rockafellar Convex Analysis (§1--§15) is re-indexed into a ToC-faithful Lean project. The right panel is an indented file tree (not full paths) illustrating structure-preserving splitting (e.g., a single source section expanded into many _partK modules) while keeping a stable section module for downstream imports and verifier-local edits.

M2F: Automated Formalization of Mathematical Literature at Scale

TL;DR

Abstract

M2F: Automated Formalization of Mathematical Literature at Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (5)