Table of Contents
Fetching ...

MEIC: Re-thinking RTL Debug Automation using LLMs

Ke Xu, Jialin Sun, Yuchen Hu, Xinwei Fang, Weiwei Shan, Xi Wang, Zhe Jiang

TL;DR

MEIC reframes RTL debugging as an iterative, multi-agent process that leverages two specialized LLMs and a rollback-enabled repository to automatically identify and fix syntax and function errors in Verilog code. It integrates an RTL toolchain, testbenches, and simulations with error classification, domain-specific tuning, and a scorer to manage LLM uncertainty, accompanied by open-source tooling and a 178-instance Verilog error dataset. Empirical results show syntax and function fix rates of 93% and 78%, respectively, with up to 48x speedups compared to skilled engineers, demonstrating substantial practical impact for RTL debugging automation and reproducibility. The work advances RTL debugging by combining iterative LLM reasoning, domain knowledge, and robust governance mechanisms, offering a scalable path toward more reliable hardware verification workflows.

Abstract

The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/.

MEIC: Re-thinking RTL Debug Automation using LLMs

TL;DR

MEIC reframes RTL debugging as an iterative, multi-agent process that leverages two specialized LLMs and a rollback-enabled repository to automatically identify and fix syntax and function errors in Verilog code. It integrates an RTL toolchain, testbenches, and simulations with error classification, domain-specific tuning, and a scorer to manage LLM uncertainty, accompanied by open-source tooling and a 178-instance Verilog error dataset. Empirical results show syntax and function fix rates of 93% and 78%, respectively, with up to 48x speedups compared to skilled engineers, demonstrating substantial practical impact for RTL debugging automation and reproducibility. The work advances RTL debugging by combining iterative LLM reasoning, domain knowledge, and robust governance mechanisms, offering a scalable path toward more reliable hardware verification workflows.

Abstract

The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/.
Paper Structure (14 sections, 1 equation, 12 figures, 4 tables)

This paper contains 14 sections, 1 equation, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Hardware development flow in the human world (Spec: Specification; Arch: Architecture; Req: requirement): the flow involves the specification definition, frontend development, and backend implementation. After the design requirement is defined, the RTL is coded at both IP and SoC levels. To ensure the design's correctness, multiple iterations of the verification and debugging must proceed, usually consuming twice the duration compared to the design phase.
  • Figure 2: MEIC overview: the framework initialises with the DUT, which is compiled and simulated by the RTL toolchain (step 0). The resultant logs and code are forwarded to the debug agent for error resolution (step 1). The revised RTL code is examined by the scorer agent (step 2) and stored in the repository (step 3), from which the highest-scored code is selected for the following debugging iteration (step 4).
  • Figure 3: Part of the input patterns for the self-planning. In addition to debugging based on provided files (lines 1-4), the agent is also required to plan the debugging process (line 5).
  • Figure 4: Reply of the LLM. With self-planning technique, the LLM provides a set of steps for RTL analysis and debugging.
  • Figure 5: Testbench from reference model to Verilog.
  • ...and 7 more figures