LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

Zixi Zhang; Balint Szekely; Pedro Gimenes; Greg Chadwick; Hugo McNally; Jianyi Cheng; Robert Mullins; Yiren Zhao

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

Zixi Zhang, Balint Szekely, Pedro Gimenes, Greg Chadwick, Hugo McNally, Jianyi Cheng, Robert Mullins, Yiren Zhao

TL;DR

The paper tackles the labor-intensive task of hardware DV test stimuli generation by introducing LLM4DV, a benchmarking framework that orchestrates prompting-driven LLMs to generate stimuli guided by coverage feedback. It evaluates six LLMs across eight DUTs using six prompting enhancements, demonstrating that suitably prompted LLMs can match or exceed naive constrained-random testing in coverage while reducing human effort. Its key contributions are the open-source LLM4DV framework, a set of prompting techniques, and a diverse DUT/LLM evaluation, providing a reproducible platform for future research. The work suggests that LLMs can meaningfully aid hardware DV on many designs, but more work is needed to scale to highly complex DUTs and to establish broader benchmarks and datasets for the field.

Abstract

Hardware design verification (DV) is a process that checks the functional equivalence of a hardware design against its specifications, improving hardware reliability and robustness. A key task in the DV process is the test stimuli generation, which creates a set of conditions or inputs for testing. These test conditions are often complex and specific to the given hardware design, requiring substantial human engineering effort to optimize. We seek a solution of automated and efficient testing for arbitrary hardware designs that takes advantage of large language models (LLMs). LLMs have already shown promising results for improving hardware design automation, but remain under-explored for hardware DV. In this paper, we propose an open-source benchmarking framework named LLM4DV that efficiently orchestrates LLMs for automated hardware test stimuli generation. Our analysis evaluates six different LLMs involving six prompting improvements over eight hardware designs and provides insight for future work on LLMs development for efficient automated DV.

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

TL;DR

Abstract

Paper Structure (8 sections, 2 figures, 3 tables)

This paper contains 8 sections, 2 figures, 3 tables.

Introduction
Background and Related Work
LLM4DV Benchmarks
LLM4DV Framework
Evaluation setup: DUTs and models
LLM4DV prompting enhancements
Results and Analysis
Conclusion: gimmick or trend?

Figures (2)

Figure 1: An overview of LLM4DV framework. The right part shows a traditional DV process. DV engineers need to manually interact with the DV process by tailoring various stimulus and observing the coverage. Such a manual process is often iterative. The left part highlights our contributions, which adds the stimulus generation agent for automated guidance.
Figure 2: Example prompts and responses on the Primitive Data Prefetcher Core module. The purple box is the system message, containing a general format instruction. The green box is an initial query, containing a coverage plan summary (orange text). The blue box is an interactive query, containing differences i.e. coverage feedback (red text).

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

TL;DR

Abstract

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)