All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

Deepak Narayan Gadde; Aman Kumar; Thomas Nalapat; Evgenii Rezunov; Fabio Cappellini

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

Deepak Narayan Gadde, Aman Kumar, Thomas Nalapat, Evgenii Rezunov, Fabio Cappellini

TL;DR

It was found that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks, so this dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.

Abstract

Modern hardware designs have grown increasingly efficient and complex. However, they are often susceptible to Common Weakness Enumerations (CWEs). This paper is focused on the formal verification of CWEs in a dataset of hardware designs written in SystemVerilog from Regenerative Artificial Intelligence (AI) powered by Large Language Models (LLMs). We applied formal verification to categorize each hardware design as vulnerable or CWE-free. This dataset was generated by 4 different LLMs and features a unique set of designs for each of the 10 CWEs we target in our paper. We have associated the identified vulnerabilities with CWE numbers for a dataset of 60,000 generated SystemVerilog Register Transfer Level (RTL) code. It was also found that most LLMs are not aware of any hardware CWEs; hence they are usually not considered when generating the hardware code. Our study reveals that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks. The dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

TL;DR

Abstract

Paper Structure (14 sections, 6 figures, 4 tables)

This paper contains 14 sections, 6 figures, 4 tables.

Introduction
Background
Common Weakness Enumerations
Prior Work
Formal Verification
LLM Evaluation Setup
The Problem Set
Input Parameters
Formal Verification
LLM Evaluation and Results
Reserach Questions
Results
Discussion and Limitations
Conclusion

Figures (6)

Figure 1: ReFormAI dataset generation and vulnerability labelling with formal verification
Figure 2: Formal verifier aman_dvcon_ecc
Figure 3: Pass@k score for the designs generated from different LLMs and difficulty levels
Figure 4: SystemVerilog keyword frequency in ReFormAI
Figure 5: CWE Pass@k results for all LLMs represented as heatmaps
...and 1 more figures

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

TL;DR

Abstract

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)