Table of Contents
Fetching ...

ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation

Peiyang Wu, Nan Guo, Xiao Xiao, Wenming Li, Xiaochun Ye, Dongrui Fan

TL;DR

The paper tackles the challenge of RTL Verilog code generation with LLMs under data scarcity and distribution mismatch. It introduces ITERTL, an iterative sampling-training framework that uses RTL-tool feedback to rank and filter model outputs, combined with a plug-and-play data filter that enforces self-contained RTL modules. Empirically, ITERTL achieves state-of-the-art performance with limited data, notably 53.8% pass@1 on the Verilog-Human benchmark, and outperforms several baselines under similar data conditions, while still recognizing higher results when very large high-quality datasets are available. The approach demonstrates strong data efficiency and robustness, with potential applicability to other domains where automatic feedback from domain tools is available.

Abstract

Recently, large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation and improving hardware design efficiency. However, the existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require large amounts of reference data, which are costly to acquire. To mitigate these issues, we innovatively introduce an iterative training paradigm named ITERTL. During each iteration, samples are drawn from the model trained in the previous cycle. Then these new samples are employed for training in current loop. Furthermore, we introduce a plug-and-play data filtering strategy, thereby encouraging the model to generate high-quality, self-contained code. Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark. Under similar conditions of data quantity and quality, our approach significantly outperforms the baseline. Extensive experiments validate the effectiveness of the proposed method.

ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation

TL;DR

The paper tackles the challenge of RTL Verilog code generation with LLMs under data scarcity and distribution mismatch. It introduces ITERTL, an iterative sampling-training framework that uses RTL-tool feedback to rank and filter model outputs, combined with a plug-and-play data filter that enforces self-contained RTL modules. Empirically, ITERTL achieves state-of-the-art performance with limited data, notably 53.8% pass@1 on the Verilog-Human benchmark, and outperforms several baselines under similar data conditions, while still recognizing higher results when very large high-quality datasets are available. The approach demonstrates strong data efficiency and robustness, with potential applicability to other domains where automatic feedback from domain tools is available.

Abstract

Recently, large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation and improving hardware design efficiency. However, the existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require large amounts of reference data, which are costly to acquire. To mitigate these issues, we innovatively introduce an iterative training paradigm named ITERTL. During each iteration, samples are drawn from the model trained in the previous cycle. Then these new samples are employed for training in current loop. Furthermore, we introduce a plug-and-play data filtering strategy, thereby encouraging the model to generate high-quality, self-contained code. Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark. Under similar conditions of data quantity and quality, our approach significantly outperforms the baseline. Extensive experiments validate the effectiveness of the proposed method.
Paper Structure (11 sections, 5 equations, 5 figures, 2 tables)

This paper contains 11 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Fine-tuning dataset size and pass@1 on VerilogEval-human. Our method achieves excellent results relying on limited data. We use * to denote models trained using higher quality data compared with RTLCoder-27kliu2023rtlcoder.
  • Figure 2: Our proposed ITERTL, which further introduces the iterative training paradigm and the data filter to significantly enhance the model's capability.
  • Figure 3: Examples of output code for a 4-bit adder. (a) shows the model trained directly with unfiltered data generating incomplete code without the implementation of the submodule. (b) shows the model trained with filtered data outputing correct implementation.
  • Figure 4: The loss function curves across each iteration. For better visualization, we represent the original loss function curves with light-colored lines and the results of exponential smoothing with dark-colored lines. And the vertical axis is on a log scale. As the iteration count increases, the loss function decreases.
  • Figure 5: Examples of output code for a half adder. (a) shows the error implementation generated by model trained without iteration. (b) shows the correct implementation generated by model trained with iteration.