Table of Contents
Fetching ...

CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design

Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li

TL;DR

CorrectBench is proposed, an automatic testbench generation framework with functional self-validation and self-correction, and the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches.

Abstract

Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability associated with LLMs often leads to functional errors in the generated testbenches. Previous methods do not incorporate automatic functional correction mechanisms without human intervention and still suffer from low success rates, especially for sequential tasks. To address this issue, we propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction. Utilizing only the RTL specification in natural language, the proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%. Furthermore, the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches. The comparative analysis demonstrates that our method achieves a pass ratio of 70.13% across all evaluated tasks, compared with the previous LLM-based testbench generation framework's 52.18% and a direct LLM-based generation method's 33.33%. Specifically in sequential circuits, our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method. The codes and experimental results are open-sourced at the link: https://github.com/AutoBench/CorrectBench

CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design

TL;DR

CorrectBench is proposed, an automatic testbench generation framework with functional self-validation and self-correction, and the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches.

Abstract

Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability associated with LLMs often leads to functional errors in the generated testbenches. Previous methods do not incorporate automatic functional correction mechanisms without human intervention and still suffer from low success rates, especially for sequential tasks. To address this issue, we propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction. Utilizing only the RTL specification in natural language, the proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%. Furthermore, the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches. The comparative analysis demonstrates that our method achieves a pass ratio of 70.13% across all evaluated tasks, compared with the previous LLM-based testbench generation framework's 52.18% and a direct LLM-based generation method's 33.33%. Specifically in sequential circuits, our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method. The codes and experimental results are open-sourced at the link: https://github.com/AutoBench/CorrectBench

Paper Structure

This paper contains 24 sections, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: The outline of CorrectBench workflow.
  • Figure 2: The outline of AutoBench workflow autobench. AutoBench is used as the testbench generator in Fig. \ref{['fig: CorrectBench workflow']}.
  • Figure 3: A demo of the test scenario and test stimuli in AutoBench's Verilog driver. In this demo, two stimuli are contained in one scenario. The output signals from DUT will be exported and checked by a Python checker later.
  • Figure 4: Examples of RS Matrices. The red/green color in the $i$th row and $j$th column represents the output of the $j$th scenario in the testbench is wrong/correct according to the simulation result of the $i$th RTL design. The two matrices on the left represent the correct TBs, whereas the matrix on the right indicates errors.
  • Figure 5: A Demo of Corrector. The RTL problem is shift18, an arithmetic shifter. Some details are omitted to save space.
  • ...and 2 more figures