Selene: Pioneering Automated Proof in Software Verification

Lichen Zhang; Shuai Lu; Nan Duan

Selene: Pioneering Automated Proof in Software Verification

Lichen Zhang, Shuai Lu, Nan Duan

TL;DR

Selene introduces a project-level automated proof benchmark for industrial-scale software verification, grounded in the seL4 microkernel and Isabelle-based verification. It provides a lightweight lemma-isolation framework and end-to-end proof generation pipeline that enables rapid evaluation of LLMs like GPT-3.5-turbo and GPT-4 on real-world dependencies. The study shows that GPT-4 can achieve meaningful proof success, especially for simpler, declarative lemmas, and demonstrates that augmentation strategies (similar lemmas, dependency facts, and error-guided fixes) can improve performance. Collectively, Selene demonstrates both the promise and the current limits of using LLMs to automate formal proofs in large-scale software verification, offering concrete directions for future research, including interactive proof state and advanced fact extraction.

Abstract

Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.

Selene: Pioneering Automated Proof in Software Verification

TL;DR

Abstract

Paper Structure (44 sections, 4 figures, 8 tables)

This paper contains 44 sections, 4 figures, 8 tables.

Introduction
Related Work
Automated Theorem Proving by LLM
Automated Software Verification
Selene
Preliminary of SeL4
Isabelle sessions
SeL4 verification structure
Lemma Isolation in Selene
Key Know-how about Selene
Lemma extraction
Dependency session construction
Lemma category
Correctness checking
Evaluation
...and 29 more sections

Figures (4)

Figure 1: A demonstration of the Selenpipeline for automated proof generation (best viewed in color). Selenfacilitates both the construction of proofs from scratch (indicated by the gray "generate" path) and the refinement of existing proofs augmented by error messages (highlighted by the red "fixing" path). To validate the correctness of the generated proofs, they are subjected to verification by the Isabelle prover within the authentic seL4 environment.
Figure 2: An illustrative example of the seL4 verification structure. The arrows pointing from A to B indicate that B is dependent upon A, where A and B can be lemmas, theory files, or sessions, etc.
Figure 3: A working example of lemma isolation in Selen. Based on the original seL4 structure in Figure \ref{['fig:sel4']}, we construct an isolated session (AInvs_TGT) along with a dependency session (AInvs_DEP) to facilitate efficient verification of the target lemma (strengthen_Not).
Figure 4: Demonstrative examples of similar lemma augmentation and dependency augmentation.

Selene: Pioneering Automated Proof in Software Verification

TL;DR

Abstract

Selene: Pioneering Automated Proof in Software Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)