Table of Contents
Fetching ...

Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4

Jianyu Zhang, Fuyuan Zhang, Jiayi Lu, Jilin Hu, Xiaoyi Yin, Long Zhang, Feng Yang, Yongwang Zhao

TL;DR

This work introduces AutoReal, an LLM-driven theorem proving approach designed for real-world industrial-scale verification. By combining step-level chain-of-thought training with targeted proof-context augmentation, AutoReal enables a compact 7B model to generate complete Isabelle proofs with aligned natural-language rationales, suitable for local deployment. In evaluations on seL4 and AFP, AutoReal-Prover achieves 51.67% and 53.88% proof success rates respectively, substantially surpassing prior results on seL4 and demonstrating generalization to other verification domains. The approach also provides interpretable explanations to aid human inspection and repair, marking a meaningful advance in practical, scalable, and transparent AI-assisted formal verification.

Abstract

Formal methods (FM) are reliable but costly to apply, often requiring years of expert effort in industrial-scale projects such as seL4, especially for theorem proving. Recent advances in large language models (LLMs) have made automated theorem proving increasingly feasible. However, most prior work focuses on mathematics-oriented benchmarks such as miniF2F, with limited evaluation on real-world verification projects. The few studies that consider industrial-scale verification mostly rely on closed-source models with hundreds of billions of parameters, which cannot be locally deployed and incur substantial usage costs. In this paper, we propose AutoReal, an LLM-driven theorem proving method for real-world industrial-scale systems with support for lightweight local deployment. We evaluate AutoReal on the seL4-Isabelle verification project as a representative and challenging case study. AutoReal incorporates two key improvements: (1) chain-of-thought (CoT)-based proof training, which teaches the LLM the reasoning behind proof steps and enables step-wise explanations alongside proofs, and (2) context augmentation, which leverages proof context from the project to enhance LLM-driven proving. Based on the AutoReal methodology, we fine-tune a base model to obtain AutoReal-Prover, a compact 7B-scale prover for industrial-scale theorem proving. AutoReal-Prover achieves a 51.67% proof success rate on 660 theorems from seL4-designated Important Theories across all 10 seL4 proof categories, substantially outperforming prior attempts on seL4 (27.06%). To evaluate generalization, we further apply AutoReal-Prover to three security-related projects from the Archive of Formal Proofs (AFP), covering all 451 theorems and achieving a proof success rate of 53.88%. Overall, this work advances the application of LLM-driven theorem proving in real-world industrial-scale verification.

Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4

TL;DR

This work introduces AutoReal, an LLM-driven theorem proving approach designed for real-world industrial-scale verification. By combining step-level chain-of-thought training with targeted proof-context augmentation, AutoReal enables a compact 7B model to generate complete Isabelle proofs with aligned natural-language rationales, suitable for local deployment. In evaluations on seL4 and AFP, AutoReal-Prover achieves 51.67% and 53.88% proof success rates respectively, substantially surpassing prior results on seL4 and demonstrating generalization to other verification domains. The approach also provides interpretable explanations to aid human inspection and repair, marking a meaningful advance in practical, scalable, and transparent AI-assisted formal verification.

Abstract

Formal methods (FM) are reliable but costly to apply, often requiring years of expert effort in industrial-scale projects such as seL4, especially for theorem proving. Recent advances in large language models (LLMs) have made automated theorem proving increasingly feasible. However, most prior work focuses on mathematics-oriented benchmarks such as miniF2F, with limited evaluation on real-world verification projects. The few studies that consider industrial-scale verification mostly rely on closed-source models with hundreds of billions of parameters, which cannot be locally deployed and incur substantial usage costs. In this paper, we propose AutoReal, an LLM-driven theorem proving method for real-world industrial-scale systems with support for lightweight local deployment. We evaluate AutoReal on the seL4-Isabelle verification project as a representative and challenging case study. AutoReal incorporates two key improvements: (1) chain-of-thought (CoT)-based proof training, which teaches the LLM the reasoning behind proof steps and enables step-wise explanations alongside proofs, and (2) context augmentation, which leverages proof context from the project to enhance LLM-driven proving. Based on the AutoReal methodology, we fine-tune a base model to obtain AutoReal-Prover, a compact 7B-scale prover for industrial-scale theorem proving. AutoReal-Prover achieves a 51.67% proof success rate on 660 theorems from seL4-designated Important Theories across all 10 seL4 proof categories, substantially outperforming prior attempts on seL4 (27.06%). To evaluate generalization, we further apply AutoReal-Prover to three security-related projects from the Archive of Formal Proofs (AFP), covering all 451 theorems and achieving a proof success rate of 53.88%. Overall, this work advances the application of LLM-driven theorem proving in real-world industrial-scale verification.
Paper Structure (57 sections, 12 figures, 4 tables)

This paper contains 57 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: The application scenario of AutoReal in the seL4 verification workflow.
  • Figure 2: A running example for AutoReal.
  • Figure 3: Method overview of AutoReal
  • Figure 4: Step-level CoT data construction from Isabelle proof traces.
  • Figure 5: Training instance structure for CoT-based proof training.
  • ...and 7 more figures