Table of Contents
Fetching ...

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

Mingjie Liu, Yun-Da Tsai, Wenfei Zhou, Haoxing Ren

TL;DR

CraftRTL tackles core bottlenecks in Verilog code generation by LLMs, notably handling non-textual representations (Karnaugh maps, FSMs, waveforms) and the variability of training outcomes. The authors introduce correct-by-construction data for non-textual problems and a automated, error-report–driven targeted code repair pipeline to fix consistent minor mistakes observed during fine-tuning. By combining SDG data, correct-by-construction CC data, and Repair data, and evaluating across VerilogEval and RTLLM benchmarks, the approach achieves state-of-the-art results, with Starcoder2-15B delivering substantial gains in pass@1 and improved training stability. The work also emphasizes the importance of non-textual data representations in hardware design and provides reproducible pipelines and prompts to enable broader adoption and extension to other HDL domains.

Abstract

Despite the significant progress made in code generation with large language models, challenges persist, especially with hardware description languages such as Verilog. This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods. We identify two main issues: difficulties in handling non-textual representations (Karnaugh maps, state-transition diagrams and waveforms) and significant variability during training with models randomly making "minor" mistakes. To address these limitations, we enhance data curation by creating correct-by-construction data targeting non-textual representations. Additionally, we introduce an automated framework that generates error reports from various model checkpoints and injects these errors into open-source code to create targeted code repair data. Our fine-tuned Starcoder2-15B outperforms prior state-of-the-art results by 3.8%, 10.9%, 6.6% for pass@1 on VerilogEval-Machine, VerilogEval-Human, and RTLLM.

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

TL;DR

CraftRTL tackles core bottlenecks in Verilog code generation by LLMs, notably handling non-textual representations (Karnaugh maps, FSMs, waveforms) and the variability of training outcomes. The authors introduce correct-by-construction data for non-textual problems and a automated, error-report–driven targeted code repair pipeline to fix consistent minor mistakes observed during fine-tuning. By combining SDG data, correct-by-construction CC data, and Repair data, and evaluating across VerilogEval and RTLLM benchmarks, the approach achieves state-of-the-art results, with Starcoder2-15B delivering substantial gains in pass@1 and improved training stability. The work also emphasizes the importance of non-textual data representations in hardware design and provides reproducible pipelines and prompts to enable broader adoption and extension to other HDL domains.

Abstract

Despite the significant progress made in code generation with large language models, challenges persist, especially with hardware description languages such as Verilog. This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods. We identify two main issues: difficulties in handling non-textual representations (Karnaugh maps, state-transition diagrams and waveforms) and significant variability during training with models randomly making "minor" mistakes. To address these limitations, we enhance data curation by creating correct-by-construction data targeting non-textual representations. Additionally, we introduce an automated framework that generates error reports from various model checkpoints and injects these errors into open-source code to create targeted code repair data. Our fine-tuned Starcoder2-15B outperforms prior state-of-the-art results by 3.8%, 10.9%, 6.6% for pass@1 on VerilogEval-Machine, VerilogEval-Human, and RTLLM.
Paper Structure (69 sections, 1 equation, 29 figures, 17 tables, 1 algorithm)

This paper contains 69 sections, 1 equation, 29 figures, 17 tables, 1 algorithm.

Figures (29)

  • Figure 1: Our methods reduce pass rate variability during training: SDG (left) shows high volatility with significant degradation on many problems, while SDG-CC-Repair (right) stabilizes learning outcomes on solvable problems (details in \ref{['sec:appendix_fig1_details']}).
  • Figure 2: State transition logic.
  • Figure 3: Overview of our approach for generating targeted code repair data: (1) prompting the LLM to generate detailed error reports from correct and erroneous code, (2) validating error report quality by ensuring the LLM can debug the errors based on the report, and (3) leveraging the LLM to inject similar errors into open-source code, creating a diverse training dataset.
  • Figure 4: pass@1 on non-textual problems with total number of CC data with temperature 0.8.
  • Figure 5: An example demonstrating the process for targeted code repair. In this example, training checkpoints during training would have the model sometimes generated correct and error solutions. We use LLMs to first summarize the errors for a detailed Error Report and then inject the errors to open-source code to construct Repair data.
  • ...and 24 more figures