Table of Contents
Fetching ...

SITA: A Framework for Structure-to-Instance Theorem Autoformalization

Chenyi Li, Wanli Ma, Zichen Wang, Zaiwen Wen

TL;DR

Structure-to-instance theorem autoformalization addresses how abstract theories can be instantiated in concrete settings within Lean, enabling reusable, verified formal reasoning. The $SITA$ framework combines an end-to-end pipeline with LLM-driven skeleton construction, error-guided refinement, and postprocessing to generate definitions, instances, and proofs tied to abstract templates. Empirical results on optimization problems show that $SITA$ improves formalization completeness and proof success compared to direct generation, supported by ablations and a growing benchmark of formal problems. This work advances scalable, verifiable formal libraries and datasets, bridging symbolic reasoning with automated synthesis for research-level mathematics.

Abstract

While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in concrete settings. With the goal of auto-formalizing mathematical results at the research level, we develop a framework for structure-to-instance theorem autoformalization (SITA), which systematically bridges the gap between abstract mathematical theories and their concrete applications in Lean proof assistant. Formalized abstract structures are treated as modular templates that contain definitions, assumptions, operations, and theorems. These templates serve as reusable guides for the formalization of concrete instances. Given a specific instantiation, we generate corresponding Lean definitions and instance declarations, integrate them using Lean's typeclass mechanism, and construct verified theorems by checking structural assumptions. We incorporate LLM-based generation with feedback-guided refinement to ensure both automation and formal correctness. Experiments on a dataset of optimization problems demonstrate that SITA effectively formalizes diverse instances grounded in abstract structures.

SITA: A Framework for Structure-to-Instance Theorem Autoformalization

TL;DR

Structure-to-instance theorem autoformalization addresses how abstract theories can be instantiated in concrete settings within Lean, enabling reusable, verified formal reasoning. The framework combines an end-to-end pipeline with LLM-driven skeleton construction, error-guided refinement, and postprocessing to generate definitions, instances, and proofs tied to abstract templates. Empirical results on optimization problems show that improves formalization completeness and proof success compared to direct generation, supported by ablations and a growing benchmark of formal problems. This work advances scalable, verifiable formal libraries and datasets, bridging symbolic reasoning with automated synthesis for research-level mathematics.

Abstract

While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in concrete settings. With the goal of auto-formalizing mathematical results at the research level, we develop a framework for structure-to-instance theorem autoformalization (SITA), which systematically bridges the gap between abstract mathematical theories and their concrete applications in Lean proof assistant. Formalized abstract structures are treated as modular templates that contain definitions, assumptions, operations, and theorems. These templates serve as reusable guides for the formalization of concrete instances. Given a specific instantiation, we generate corresponding Lean definitions and instance declarations, integrate them using Lean's typeclass mechanism, and construct verified theorems by checking structural assumptions. We incorporate LLM-based generation with feedback-guided refinement to ensure both automation and formal correctness. Experiments on a dataset of optimization problems demonstrate that SITA effectively formalizes diverse instances grounded in abstract structures.

Paper Structure

This paper contains 66 sections, 7 theorems, 16 equations, 6 figures, 7 tables, 2 algorithms.

Key Result

Lemma 1

The function $f$ has gradient $\nabla f(d) = M^\top (M d - b)$ at every $d$.

Figures (6)

  • Figure 1: Overall pipeline of SITA
  • Figure 2: Illustration of structure-to-instance formalization.
  • Figure 3: Evaluation performance across generation passes.
  • Figure 4: Distribution of algorithm types in the dataset
  • Figure 5: Time consumption of each part of the generation using DeepSeek-R1. The first correction denotes the backbone correction stage. The second correction denotes the proof correction stage. Harmless fixing denotes the output post-processing part.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 1: Mathematical Structures with Operations
  • Example 1
  • Example 2
  • Example 3
  • Definition 2
  • Definition 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • ...and 3 more