Table of Contents
Fetching ...

Faver: Boosting LLM-based RTL Generation with Function Abstracted Verifiable Middleware

Jianan Mu, Mingyu Shi, Yining Wang, Tianmeng Yang, Bin Sun, Xing Hu, Jing Ye, Huawei Li

TL;DR

This work tackles the challenge of automating RTL generation with LLMs by addressing the semantic gap between high-level descriptions and clocked hardware semantics. It introduces Faver, a function abstracted verifiable middleware, which lets the LLMs write verification in high-level languages while a function-class abstraction aligns hardware timing and state, enabling effective Python–Verilog co-simulation. The framework comprises verification-spec generation, a class-template-based reference model with time-variable extraction, and hierarchical test stimuli generated through LLM–rule collaboration, followed by iterative co-simulation and refinement. Empirical results show that Faver improves RTL-generation correctness across multiple models and benchmarks by up to 14%, with ablation studies confirming the distinct value of ref-model generation and hierarchical stimuli in boosting verification accuracy and robustness.

Abstract

LLM-based RTL generation is an interesting research direction, as it holds the potential to liberate the least automated stage in the current chip design. However, due to the substantial semantic gap between high-level specifications and RTL, coupled with limited training data, existing models struggle with generation accuracy. Drawing on human experience, design with verification helps improving accuracy. However, as the RTL testbench data are even more scarce, it is not friendly for LLMs. Although LLMs excel at higher-level languages like Python/C, they have a huge semantic gap from RTL. When implementing the same functionality, Python/C code and hardware code differ significantly in the spatiotemporal granularity, requiring the LLM not only to consider high-level functional semantics but also to ensure the low-level details align with the circuit code. It is not an easy task. In this paper, we propose a function abstracted verifiable middleware (Faver) that streamlines RTL verification in LLM-based workflows. By mixing LLM-friendly code structures with a rule-based template, Faver decouples the details of circuit verification, allowing the LLM to focus on the functionality itself. In our experiments on the SFT model and open-source models, Faver improved the model's generation accuracy by up to 14%.

Faver: Boosting LLM-based RTL Generation with Function Abstracted Verifiable Middleware

TL;DR

This work tackles the challenge of automating RTL generation with LLMs by addressing the semantic gap between high-level descriptions and clocked hardware semantics. It introduces Faver, a function abstracted verifiable middleware, which lets the LLMs write verification in high-level languages while a function-class abstraction aligns hardware timing and state, enabling effective Python–Verilog co-simulation. The framework comprises verification-spec generation, a class-template-based reference model with time-variable extraction, and hierarchical test stimuli generated through LLM–rule collaboration, followed by iterative co-simulation and refinement. Empirical results show that Faver improves RTL-generation correctness across multiple models and benchmarks by up to 14%, with ablation studies confirming the distinct value of ref-model generation and hierarchical stimuli in boosting verification accuracy and robustness.

Abstract

LLM-based RTL generation is an interesting research direction, as it holds the potential to liberate the least automated stage in the current chip design. However, due to the substantial semantic gap between high-level specifications and RTL, coupled with limited training data, existing models struggle with generation accuracy. Drawing on human experience, design with verification helps improving accuracy. However, as the RTL testbench data are even more scarce, it is not friendly for LLMs. Although LLMs excel at higher-level languages like Python/C, they have a huge semantic gap from RTL. When implementing the same functionality, Python/C code and hardware code differ significantly in the spatiotemporal granularity, requiring the LLM not only to consider high-level functional semantics but also to ensure the low-level details align with the circuit code. It is not an easy task. In this paper, we propose a function abstracted verifiable middleware (Faver) that streamlines RTL verification in LLM-based workflows. By mixing LLM-friendly code structures with a rule-based template, Faver decouples the details of circuit verification, allowing the LLM to focus on the functionality itself. In our experiments on the SFT model and open-source models, Faver improved the model's generation accuracy by up to 14%.

Paper Structure

This paper contains 16 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: (a) Using the LLM itself as a judge or writing RTL as the testbench both hinder efficient design-with-verification. (b) Motivation: LLM writes high-level code to verify RTL design. (c) Challenge: Traditional high-level code lacks timing-related inputs and variables. (d) Faver: Function abstracted verifiable middleware.
  • Figure 2: (a) Main workflow of Faver. (b) Verification specification generation. (c) Reference model and test stimuli generation based on class templates that extract time variables. (d) Test reports generation and Verilog generation recycle.
  • Figure 3: Verification specification generation.
  • Figure 4: (a) Reference model generation in Faver. (b) Test stimuli generation in Faver.
  • Figure 5: Model of Faver.
  • ...and 4 more figures