Table of Contents
Fetching ...

HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL

Heng Ping, Shixuan Li, Peiyu Zhang, Anzhe Cheng, Shukai Duan, Nikos Kanakaris, Xiongye Xiao, Wei Yang, Shahin Nazarian, Andrei Irimia, Paul Bogdan

TL;DR

HDLCoRe tackles HDL code generation hallucinations in LLMs by offering a training-free framework that blends HDL-aware Chain-of-Thought prompting with self-verification and a two-stage heterogeneous RAG system. It classifies HDL tasks by type and complexity, augments prompts with domain knowledge, generates and self-validates testbenches, and retrieves relevant HDL exemplars from a curated open-source database to guide generation. On RTLLM 2.0, HDLCoRe yields notable improvements in functional correctness, with particularly large gains for smaller models, and shows robustness across model scales and HDL task categories. This approach provides a practical, data-efficient path to higher-quality HDL generation without fine-tuning or external tooling, potentially accelerating hardware design workflows and enabling broader accessibility of HDL code generation.

Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, when applied to hardware description languages (HDL), these models exhibit significant limitations due to data scarcity, resulting in hallucinations and incorrect code generation. To address these challenges, we propose HDLCoRe, a training-free framework that enhances LLMs' HDL generation capabilities through prompt engineering techniques and retrieval-augmented generation (RAG). Our approach consists of two main components: (1) an HDL-aware Chain-of-Thought (CoT) prompting technique with self-verification that classifies tasks by complexity and type, incorporates domain-specific knowledge, and guides LLMs through step-by-step self-simulation for error correction; and (2) a two-stage heterogeneous RAG system that addresses formatting inconsistencies through key component extraction and efficiently retrieves relevant HDL examples through sequential filtering and re-ranking. HDLCoRe eliminates the need for model fine-tuning while substantially improving LLMs' HDL generation capabilities. Experimental results demonstrate that our framework achieves superior performance on the RTLLM2.0 benchmark, significantly reducing hallucinations and improving both syntactic and functional correctness.

HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL

TL;DR

HDLCoRe tackles HDL code generation hallucinations in LLMs by offering a training-free framework that blends HDL-aware Chain-of-Thought prompting with self-verification and a two-stage heterogeneous RAG system. It classifies HDL tasks by type and complexity, augments prompts with domain knowledge, generates and self-validates testbenches, and retrieves relevant HDL exemplars from a curated open-source database to guide generation. On RTLLM 2.0, HDLCoRe yields notable improvements in functional correctness, with particularly large gains for smaller models, and shows robustness across model scales and HDL task categories. This approach provides a practical, data-efficient path to higher-quality HDL generation without fine-tuning or external tooling, potentially accelerating hardware design workflows and enabling broader accessibility of HDL code generation.

Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, when applied to hardware description languages (HDL), these models exhibit significant limitations due to data scarcity, resulting in hallucinations and incorrect code generation. To address these challenges, we propose HDLCoRe, a training-free framework that enhances LLMs' HDL generation capabilities through prompt engineering techniques and retrieval-augmented generation (RAG). Our approach consists of two main components: (1) an HDL-aware Chain-of-Thought (CoT) prompting technique with self-verification that classifies tasks by complexity and type, incorporates domain-specific knowledge, and guides LLMs through step-by-step self-simulation for error correction; and (2) a two-stage heterogeneous RAG system that addresses formatting inconsistencies through key component extraction and efficiently retrieves relevant HDL examples through sequential filtering and re-ranking. HDLCoRe eliminates the need for model fine-tuning while substantially improving LLMs' HDL generation capabilities. Experimental results demonstrate that our framework achieves superior performance on the RTLLM2.0 benchmark, significantly reducing hallucinations and improving both syntactic and functional correctness.

Paper Structure

This paper contains 20 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of HDLCoRe framework. (a) HDL-aware CoT with Self-verification Module. We first process the design description through task-specific prompts (SC-HDL, CC-HDL, SS-HDL, CS-HDL) to generate initial HDL code. The framework then prompts for testbench creation followed by step-by-step self-simulation to identify errors and optimize the code. (b) Efficient Heterogeneous RAG System. We extract multiple key components (High Level, Low Level, Module Header) from both the task description and heterogeneous database, compute similarity scores ($S_{HL}$, $S_{LL}$, $S_{MH}$), and perform two-stage retrieval that first conducts broad filtering to select top-k samples from each category, then performs refined re-ranking to identify the most relevant HDL examples for the target task.
  • Figure 2: Self-verification mechanism. The LLM generates initial HDL code and testbench, performs step-by-step self-simulation, summarizes the results, and refines the code based on identified issues.
  • Figure 3: Ablation Study of techniques adopted in our framework. Pass@1 performance of functional correctness is reported on RTLLM2.0 dataset.
  • Figure 4: CoT Classification mechanism. We develop scripts that automatically classify the logical category of each problem description. Meanwhile, the LLM autonomously evaluates the complexity of each problem based on its internal knowledge.