Table of Contents
Fetching ...

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Yi Liu, Changran Xu, Yunhao Zhou, Zeju Li, Qiang Xu

TL;DR

DeepRTL introduces a unified representation model that jointly addresses Verilog understanding and generation, filling a gap where prior work emphasized generation alone. It leverages a high-quality, multi-level NL-Verilog dataset (line, block, module) built from open-source and proprietary code, annotated via a CoT process with GPT-4 and refined by human engineers. A first Verilog understanding benchmark is proposed, employing embedding similarity and GPT score for semantic evaluation rather than surface metrics like BLEU/ROUGE. Through Curriculum Learning applied to CodeT5+-based fine-tuning, DeepRTL achieves superior Verilog understanding (outperforming GPT-4) and competitive Verilog generation (comparable to o1-preview) with a compact model size, demonstrating strong practical potential for hardware design automation.

Abstract

Recent advancements in large language models (LLMs) have shown significant potential for automating hardware description language (HDL) code generation from high-level natural language instructions. While fine-tuning has improved LLMs' performance in hardware design tasks, prior efforts have largely focused on Verilog generation, overlooking the equally critical task of Verilog understanding. Furthermore, existing models suffer from weak alignment between natural language descriptions and Verilog code, hindering the generation of high-quality, synthesizable designs. To address these issues, we present DeepRTL, a unified representation model that excels in both Verilog understanding and generation. Based on CodeT5+, DeepRTL is fine-tuned on a comprehensive dataset that aligns Verilog code with rich, multi-level natural language descriptions. We also introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models' understanding capabilities. These metrics capture semantic similarity more accurately than traditional methods like BLEU and ROUGE, which are limited to surface-level n-gram overlaps. By adapting curriculum learning to train DeepRTL, we enable it to significantly outperform GPT-4 in Verilog understanding tasks, while achieving performance on par with OpenAI's o1-preview model in Verilog generation tasks.

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

TL;DR

DeepRTL introduces a unified representation model that jointly addresses Verilog understanding and generation, filling a gap where prior work emphasized generation alone. It leverages a high-quality, multi-level NL-Verilog dataset (line, block, module) built from open-source and proprietary code, annotated via a CoT process with GPT-4 and refined by human engineers. A first Verilog understanding benchmark is proposed, employing embedding similarity and GPT score for semantic evaluation rather than surface metrics like BLEU/ROUGE. Through Curriculum Learning applied to CodeT5+-based fine-tuning, DeepRTL achieves superior Verilog understanding (outperforming GPT-4) and competitive Verilog generation (comparable to o1-preview) with a compact model size, demonstrating strong practical potential for hardware design automation.

Abstract

Recent advancements in large language models (LLMs) have shown significant potential for automating hardware description language (HDL) code generation from high-level natural language instructions. While fine-tuning has improved LLMs' performance in hardware design tasks, prior efforts have largely focused on Verilog generation, overlooking the equally critical task of Verilog understanding. Furthermore, existing models suffer from weak alignment between natural language descriptions and Verilog code, hindering the generation of high-quality, synthesizable designs. To address these issues, we present DeepRTL, a unified representation model that excels in both Verilog understanding and generation. Based on CodeT5+, DeepRTL is fine-tuned on a comprehensive dataset that aligns Verilog code with rich, multi-level natural language descriptions. We also introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models' understanding capabilities. These metrics capture semantic similarity more accurately than traditional methods like BLEU and ROUGE, which are limited to surface-level n-gram overlaps. By adapting curriculum learning to train DeepRTL, we enable it to significantly outperform GPT-4 in Verilog understanding tasks, while achieving performance on par with OpenAI's o1-preview model in Verilog generation tasks.

Paper Structure

This paper contains 30 sections, 10 figures, 7 tables.

Figures (10)

  • Figure 1: The overview of the data annotation process. We employ the CoT approach and the SOTA LLM, GPT-4, for annotation. Annotations span three levels—line, block, and module—providing both detailed specifications and high-level functional descriptions.
  • Figure 2: An example of our comprehensive annotation for a complete Verilog module.
  • Figure 3: The overview of the instruction construction process and the curriculum learning strategy. For instruction construction, we integrate various settings, e.g., task type, granularity, and comment level, to create tailored instructions for specific scenarios. The curriculum learning strategy involves three hierarchical stages: training progresses from line-level to module-level code (1 stage), transitioning from detailed to high-level descriptions at each level (2 stage), and advancing from GPT-annotated to human-annotated descriptions for each granularity (3 stage).
  • Figure 4: Detailed prompts used in the CoT annotation process.
  • Figure 5: The distribution of the token lengths of the generation benchmark by chang2024natural.
  • ...and 5 more figures