Table of Contents
Fetching ...

Selene: Pioneering Automated Proof in Software Verification

Lichen Zhang, Shuai Lu, Nan Duan

TL;DR

Selene introduces a project-level automated proof benchmark for industrial-scale software verification, grounded in the seL4 microkernel and Isabelle-based verification. It provides a lightweight lemma-isolation framework and end-to-end proof generation pipeline that enables rapid evaluation of LLMs like GPT-3.5-turbo and GPT-4 on real-world dependencies. The study shows that GPT-4 can achieve meaningful proof success, especially for simpler, declarative lemmas, and demonstrates that augmentation strategies (similar lemmas, dependency facts, and error-guided fixes) can improve performance. Collectively, Selene demonstrates both the promise and the current limits of using LLMs to automate formal proofs in large-scale software verification, offering concrete directions for future research, including interactive proof state and advanced fact extraction.

Abstract

Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.

Selene: Pioneering Automated Proof in Software Verification

TL;DR

Selene introduces a project-level automated proof benchmark for industrial-scale software verification, grounded in the seL4 microkernel and Isabelle-based verification. It provides a lightweight lemma-isolation framework and end-to-end proof generation pipeline that enables rapid evaluation of LLMs like GPT-3.5-turbo and GPT-4 on real-world dependencies. The study shows that GPT-4 can achieve meaningful proof success, especially for simpler, declarative lemmas, and demonstrates that augmentation strategies (similar lemmas, dependency facts, and error-guided fixes) can improve performance. Collectively, Selene demonstrates both the promise and the current limits of using LLMs to automate formal proofs in large-scale software verification, offering concrete directions for future research, including interactive proof state and advanced fact extraction.

Abstract

Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.
Paper Structure (44 sections, 4 figures, 8 tables)

This paper contains 44 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: A demonstration of the Selenpipeline for automated proof generation (best viewed in color). Selenfacilitates both the construction of proofs from scratch (indicated by the gray "generate" path) and the refinement of existing proofs augmented by error messages (highlighted by the red "fixing" path). To validate the correctness of the generated proofs, they are subjected to verification by the Isabelle prover within the authentic seL4 environment.
  • Figure 2: An illustrative example of the seL4 verification structure. The arrows pointing from A to B indicate that B is dependent upon A, where A and B can be lemmas, theory files, or sessions, etc.
  • Figure 3: A working example of lemma isolation in Selen. Based on the original seL4 structure in Figure \ref{['fig:sel4']}, we construct an isolated session (AInvs_TGT) along with a dependency session (AInvs_DEP) to facilitate efficient verification of the target lemma (strengthen_Not).
  • Figure 4: Demonstrative examples of similar lemma augmentation and dependency augmentation.