Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification

Mohammad Akyash; Hadi Mardani Kamali

Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification

Mohammad Akyash, Hadi Mardani Kamali

TL;DR

This work tackles the burden of debugging hardware vulnerabilities in RTL/SoC designs by introducing Self-HWDebug, an LLM-driven framework that autonomously generates targeted debugging instructions from vulnerable/secure RTL pairs and applies them to unseen designs within the same CWE category. The method employs a two-stage process with multi-level instruction granularity (Basic, Intermediate, Advanced) and supports one-shot and two-shot reference strategies, using formulas such as $I_{x1,i} = M(T_{x,i} ⊕ V_{x1} ⊕ S_{x1})$, $S_{xi} = M(T_g ⊕ I_{xi} ⊕ V_{x})$, and $I_{xt} = M(T_{xt} ⊕ V_{x1} ⊕ S_{x1} ⊕ V_{x2} ⊕ S_{x2})$ to generate, apply, and validate fixes. The authors experiment across five CWEs with Llama3-70B (via Groq API) and evaluate one-shot versus two-shot setups, as well as the impact of more advanced models like GPT-4 for guidance. Results indicate that increasing the number of references and using higher instruction levels improve mitigation success, with GPT-4-assisted distillation significantly enhancing Llama3 repairs. The work demonstrates promising scalability and reduced expert effort for hardware security verification, and points to future work on expanding CWE coverage and using LLMs as both detectors and mitigators for large-scale SoC designs.

Abstract

The rise of instruction-tuned Large Language Models (LLMs) marks a significant advancement in artificial intelligence (AI) (tailored to respond to specific prompts). Despite their popularity, applying such models to debug security vulnerabilities in hardware designs, i.e., register transfer language (RTL) modules, particularly at system-on-chip (SoC) level, presents considerable challenges. One of the main issues lies in the need for precisely designed instructions for pinpointing and mitigating the vulnerabilities, which requires substantial time and expertise from human experts. In response to this challenge, this paper proposes Self-HWDebug, an innovative framework that leverages LLMs to automatically create required debugging instructions. In Self-HWDebug, a set of already identified bugs from the most critical hardware common weakness enumeration (CWE) listings, along with mitigation resolutions, is provided to the framework, followed by prompting the LLMs to generate targeted instructions for such mitigation. The LLM-generated instructions are subsequently used as references to address vulnerabilities within the same CWE category but in totally different designs, effectively demonstrating the framework's ability to extend solutions across related security issues. Self-HWDebug significantly reduces human intervention by using the model's own output to guide debugging. Through comprehensive testing, Self-HWDebug proves not only to reduce experts' effort/time but also to even improve the quality of the debugging process.

Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification

TL;DR

, and

to generate, apply, and validate fixes. The authors experiment across five CWEs with Llama3-70B (via Groq API) and evaluate one-shot versus two-shot setups, as well as the impact of more advanced models like GPT-4 for guidance. Results indicate that increasing the number of references and using higher instruction levels improve mitigation success, with GPT-4-assisted distillation significantly enhancing Llama3 repairs. The work demonstrates promising scalability and reduced expert effort for hardware security verification, and points to future work on expanding CWE coverage and using LLMs as both detectors and mitigators for large-scale SoC designs.

Abstract

Paper Structure (14 sections, 7 figures, 2 tables)

This paper contains 14 sections, 7 figures, 2 tables.

Introduction
Background and Related Works
Proposed Scheme: Self-HWDebug
Instruction Generation at Multiple Levels
Mitigating Vulnerabilities with Generated Instructions
Using multiple references for higher accuracy
Experiments and Results
Bugs Descriptions
Experimental Settings
Instruction Generation in One- and Two-Shot Approaches
Instruction Generation by More Advanced Model
Comparison of Different Levels of Instruction
Takeaways for Self Instructing in Hardware Verification
Conclusion and future work

Figures (7)

Figure 1: The Use of LLMs for SW/HW Coding (Design) and Test (Verification).
Figure 2: The Overview of Self-Instructing for HW Security Debugging (Based on One-Shot Learning - One Reference for Self-Instructing).
Figure 3: Top View of Task Descriptions at Three Levels (I$_1$-basic, I$_2$-intermediate, I$_3$-advanced) for Instructions' Generation in Self-HWDebug (Sample CWE 1191 for One-shot Learning).
Figure 4: Top View of Generated Instructions (I$_1$-basic, I$_2$-intermediate, I$_3$-advanced) by Llama3 for Sample CWE 1191 based on One-shot Learning.
Figure 5: The Overview of Self-Instructing for HW Security Debugging (Based on Two-Shot Learning - Two References for Self-Instructing).
...and 2 more figures

Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification

TL;DR

Abstract

Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (7)