MerLean: An Agentic Framework for Autoformalization in Quantum Computation
Yuanjie Ren, Jinzheng Li, Yidi Qi
TL;DR
MerLean addresses verification bottlenecks in frontier mathematical physics by automatically formalizing LaTeX statements into machine-checked Lean 4 code. It uses a bidirectional, agentic workflow where a frontier LLM translates LaTeX into Lean, then a second pass autoinformalizes the verified Lean back into LaTeX for semantic review. On three quantum computing papers, MerLean produced 114 statements and 2,050 Lean declarations, introducing necessary definitions and axioms when Mathlib lacked corresponding results, and completing end-to-end formalization within under 42 hours. The approach demonstrates practical utility for machine-verified peer review and data generation to train future reasoning models, with potential to generalize to other rigorous domains in mathematics and physics.
Abstract
We introduce MerLean, a fully automated agentic framework for autoformalization in quantum computation. MerLean extracts mathematical statements from \LaTeX{} source files, formalizes them into verified Lean~4 code built on Mathlib, and translates the result back into human-readable \LaTeX{} for semantic review. We evaluate MerLean on three theoretical quantum computing papers producing 2,050 Lean declarations from 114 statements in total. MerLean achieves end-to-end formalization on all three papers, reducing the verification burden to only the newly introduced definitions and axioms. Our results demonstrate that agentic autoformalization can scale to frontier research, offering both a practical tool for machine-verified peer review and a scalable engine for mining high-quality synthetic data to train future reasoning models. Our approach can also be generalized to any other rigorous research in mathematics and theoretical physics.
