Table of Contents
Fetching ...

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

Deepak Narayan Gadde, Aman Kumar, Thomas Nalapat, Evgenii Rezunov, Fabio Cappellini

TL;DR

It was found that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks, so this dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.

Abstract

Modern hardware designs have grown increasingly efficient and complex. However, they are often susceptible to Common Weakness Enumerations (CWEs). This paper is focused on the formal verification of CWEs in a dataset of hardware designs written in SystemVerilog from Regenerative Artificial Intelligence (AI) powered by Large Language Models (LLMs). We applied formal verification to categorize each hardware design as vulnerable or CWE-free. This dataset was generated by 4 different LLMs and features a unique set of designs for each of the 10 CWEs we target in our paper. We have associated the identified vulnerabilities with CWE numbers for a dataset of 60,000 generated SystemVerilog Register Transfer Level (RTL) code. It was also found that most LLMs are not aware of any hardware CWEs; hence they are usually not considered when generating the hardware code. Our study reveals that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks. The dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.

All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification

TL;DR

It was found that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks, so this dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.

Abstract

Modern hardware designs have grown increasingly efficient and complex. However, they are often susceptible to Common Weakness Enumerations (CWEs). This paper is focused on the formal verification of CWEs in a dataset of hardware designs written in SystemVerilog from Regenerative Artificial Intelligence (AI) powered by Large Language Models (LLMs). We applied formal verification to categorize each hardware design as vulnerable or CWE-free. This dataset was generated by 4 different LLMs and features a unique set of designs for each of the 10 CWEs we target in our paper. We have associated the identified vulnerabilities with CWE numbers for a dataset of 60,000 generated SystemVerilog Register Transfer Level (RTL) code. It was also found that most LLMs are not aware of any hardware CWEs; hence they are usually not considered when generating the hardware code. Our study reveals that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks. The dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.
Paper Structure (14 sections, 6 figures, 4 tables)

This paper contains 14 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: ReFormAI dataset generation and vulnerability labelling with formal verification
  • Figure 2: Formal verifier aman_dvcon_ecc
  • Figure 3: Pass@k score for the designs generated from different LLMs and difficulty levels
  • Figure 4: SystemVerilog keyword frequency in ReFormAI
  • Figure 5: CWE Pass@k results for all LLMs represented as heatmaps
  • ...and 1 more figures