SITA: A Framework for Structure-to-Instance Theorem Autoformalization
Chenyi Li, Wanli Ma, Zichen Wang, Zaiwen Wen
TL;DR
Structure-to-instance theorem autoformalization addresses how abstract theories can be instantiated in concrete settings within Lean, enabling reusable, verified formal reasoning. The $SITA$ framework combines an end-to-end pipeline with LLM-driven skeleton construction, error-guided refinement, and postprocessing to generate definitions, instances, and proofs tied to abstract templates. Empirical results on optimization problems show that $SITA$ improves formalization completeness and proof success compared to direct generation, supported by ablations and a growing benchmark of formal problems. This work advances scalable, verifiable formal libraries and datasets, bridging symbolic reasoning with automated synthesis for research-level mathematics.
Abstract
While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in concrete settings. With the goal of auto-formalizing mathematical results at the research level, we develop a framework for structure-to-instance theorem autoformalization (SITA), which systematically bridges the gap between abstract mathematical theories and their concrete applications in Lean proof assistant. Formalized abstract structures are treated as modular templates that contain definitions, assumptions, operations, and theorems. These templates serve as reusable guides for the formalization of concrete instances. Given a specific instantiation, we generate corresponding Lean definitions and instance declarations, integrate them using Lean's typeclass mechanism, and construct verified theorems by checking structural assumptions. We incorporate LLM-based generation with feedback-guided refinement to ensure both automation and formal correctness. Experiments on a dataset of optimization problems demonstrate that SITA effectively formalizes diverse instances grounded in abstract structures.
