Table of Contents
Fetching ...

PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

Ye Liu, Yue Xue, Daoyuan Wu, Yuqiang Sun, Yi Li, Miaolei Shi, Yang Liu

TL;DR

PropertyGPT tackles automated generation of formal verification properties for unknown smart contracts by leveraging retrieval-augmented learning over a knowledge base of human-written properties. It integrates an intermediate Property Specification Language with a PSL compiler and a dedicated prover, guided by compiler feedback, multi-signal ranking, and in-context learning from retrieved references. Empirical results show strong recall and competitive precision in property generation, improved vulnerability detection across CVEs and attack incidents, and tangible real-world impact through bug bounty rewards. The approach advances scalable, automated formal verification for smart contracts by reducing manual specification effort and enabling broader application to complex code bases.

Abstract

With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs,such as GPT-4, to transfer existing human-written properties (e.g.,those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new property for a given code. While this basic process is relatively straightforward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated properties. We have implemented these strategies into a novel LLM-based property generation tool called PropertyGPT. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, leading to $8,256 in bug bounty rewards.

PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

TL;DR

PropertyGPT tackles automated generation of formal verification properties for unknown smart contracts by leveraging retrieval-augmented learning over a knowledge base of human-written properties. It integrates an intermediate Property Specification Language with a PSL compiler and a dedicated prover, guided by compiler feedback, multi-signal ranking, and in-context learning from retrieved references. Empirical results show strong recall and competitive precision in property generation, improved vulnerability detection across CVEs and attack incidents, and tangible real-world impact through bug bounty rewards. The approach advances scalable, automated formal verification for smart contracts by reducing manual specification effort and enabling broader application to complex code bases.

Abstract

With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs,such as GPT-4, to transfer existing human-written properties (e.g.,those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new property for a given code. While this basic process is relatively straightforward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated properties. We have implemented these strategies into a novel LLM-based property generation tool called PropertyGPT. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, leading to $8,256 in bug bounty rewards.
Paper Structure (27 sections, 4 equations, 15 figures, 7 tables)

This paper contains 27 sections, 4 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: A high-level workflow of PropertyGPT.
  • Figure 2: Property Specification Language (PSL).
  • Figure 3: The prompt for generating rule properties.
  • Figure 4: The prompt for generating invariants/conditions.
  • Figure 5: The common prompt for revising (rule) properties.
  • ...and 10 more figures