Using LLMs to Facilitate Formal Verification of RTL

Marcelo Orenes-Vera; Margaret Martonosi; David Wentzlaff

Using LLMs to Facilitate Formal Verification of RTL

Marcelo Orenes-Vera, Margaret Martonosi, David Wentzlaff

TL;DR

GPT4 can generate correct SVA even for flawed RTL, without mirroring design errors, and generates SVA that exposed a bug in the RISC-V CVA6 core that eluded the prior work's evaluation.

Abstract

Formal property verification (FPV) has existed for decades and has been shown to be effective at finding intricate RTL bugs. However, formal properties, such as those written as SystemVerilog Assertions (SVA), are time-consuming and error-prone to write, even for experienced users. Prior work has attempted to lighten this burden by raising the abstraction level so that SVA is generated from high-level specifications. However, this does not eliminate the manual effort of reasoning and writing about the detailed hardware behavior. Motivated by the increased need for FPV in the era of heterogeneous hardware and the advances in large language models (LLMs), we set out to explore whether LLMs can capture RTL behavior and generate correct SVA properties. First, we design an FPV-based evaluation framework that measures the correctness and completeness of SVA. Then, we evaluate GPT4 iteratively to craft the set of syntax and semantic rules needed to prompt it toward creating better SVA. We extend the open-source AutoSVA framework by integrating our improved GPT4-based flow to generate safety properties, in addition to facilitating their existing flow for liveness properties. Lastly, our use cases evaluate (1) the FPV coverage of GPT4-generated SVA on complex open-source RTL and (2) using generated SVA to prompt GPT4 to create RTL from scratch. Through these experiments, we find that GPT4 can generate correct SVA even for flawed RTL, without mirroring design errors. Particularly, it generated SVA that exposed a bug in the RISC-V CVA6 core that eluded the prior work's evaluation.

Using LLMs to Facilitate Formal Verification of RTL

TL;DR

GPT4 can generate correct SVA even for flawed RTL, without mirroring design errors, and generates SVA that exposed a bug in the RISC-V CVA6 core that eluded the prior work's evaluation.

Abstract

Paper Structure (11 sections, 3 figures, 1 table)

This paper contains 11 sections, 3 figures, 1 table.

Introduction and Background
Iteratively Teaching SVA to LLMs
Rule-refinement Methodology
Experiments with GPT4
Assertion Quality and Determinism
AutoSVA2: Extending AutoSVA with GPT4
Use case 1: Testing AutoSVA2 on Complex RTL
Building the PTW FT and exposing an RTL Bug
RTL Coverage of Automatically Generated SVA
Use case 2: AutoSVA2 guiding RTL generation
Discussion and Conclusion

Figures (3)

Figure 1: FPV-based evaluation framework. The FPV tool returns whether the assertions generated by the LLM are correct or not---for a given RTL. Hinted by the errors or CEXs of the FPV report, the engineer manually writes or refines the rules that guide the LLM toward generating better SVA. The rule set and the RTL are combined into a prompt in order to generate a new iteration of SVA properties. The green boxes are automated steps; the blue ones are manual.
Figure 2: Overview of AutoSVA2. Our additions to the original AutoSVA flow are shown with thick boxes and arrows; the original flow is shown with thin boxes and arrows. The green boxes indicate automatically generated artifacts. The green arrows indicate the SVA generation flow and the blue arrows the annotation generation flow. The engineer in the loop revises both the annotations and the generated SVA.
Figure 3: Methodology for iteratively building RTL and FT. The blue arrows show the RTL generation flow; green arrows, the SVA flow; black arrows, the assertion debugging flow with the human in the loop. The green boxes indicate they are automatically generated, and the blue box, manually written.

Using LLMs to Facilitate Formal Verification of RTL

TL;DR

Abstract

Using LLMs to Facilitate Formal Verification of RTL

Authors

TL;DR

Abstract

Table of Contents

Figures (3)