Table of Contents
Fetching ...

HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

Pingqing Zheng, Jiayin Qin, Fuqi Zhang, Shang Wu, Yu Cao, Caiwen Ding, Yang, Zhao

TL;DR

HDLxGraph addresses the challenge of applying LLMs to repository-scale HDL design by integrating HDL-specific graph representations into a Graph RAG framework. It builds a repository-level graph database using AST for code structure and DFG for hardware data flow, enabling multi-level retrieval and signal-level searches for debugging and completion. A novel HDLSearch benchmark is introduced to evaluate HDL code search in real-world HDL repositories. Experimental results show that HDLxGraph improves code search, debugging, and completion performance, illustrating the practical value of combining structural graphs with retrieval-augmented generation for HDL engineering.

Abstract

Large Language Models (LLMs) have demonstrated their potential in hardware design tasks, such as Hardware Description Language (HDL) generation and debugging. Yet, their performance in real-world, repository-level HDL projects with thousands or even tens of thousands of code lines is hindered. To this end, we propose HDLxGraph, a novel framework that integrates Graph Retrieval Augmented Generation (Graph RAG) with LLMs, introducing HDL-specific graph representations by incorporating Abstract Syntax Trees (ASTs) and Data Flow Graphs (DFGs) to capture both code graph view and hardware graph view. HDLxGraph utilizes a dual-retrieval mechanism that not only mitigates the limited recall issues inherent in similarity-based semantic retrieval by incorporating structural information, but also enhances its extensibility to various real-world tasks by a task-specific retrieval finetuning. Additionally, to address the lack of comprehensive HDL search benchmarks, we introduce HDLSearch, a multi-granularity evaluation dataset derived from real-world repository-level projects. Experimental results demonstrate that HDLxGraph significantly improves average search accuracy, debugging efficiency and completion quality by 12.04%, 12.22% and 5.04% compared to similarity-based RAG, respectively. The code of HDLxGraph and collected HDLSearch benchmark are available at https://github.com/Nick-Zheng-Q/HDLxGraph.

HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

TL;DR

HDLxGraph addresses the challenge of applying LLMs to repository-scale HDL design by integrating HDL-specific graph representations into a Graph RAG framework. It builds a repository-level graph database using AST for code structure and DFG for hardware data flow, enabling multi-level retrieval and signal-level searches for debugging and completion. A novel HDLSearch benchmark is introduced to evaluate HDL code search in real-world HDL repositories. Experimental results show that HDLxGraph improves code search, debugging, and completion performance, illustrating the practical value of combining structural graphs with retrieval-augmented generation for HDL engineering.

Abstract

Large Language Models (LLMs) have demonstrated their potential in hardware design tasks, such as Hardware Description Language (HDL) generation and debugging. Yet, their performance in real-world, repository-level HDL projects with thousands or even tens of thousands of code lines is hindered. To this end, we propose HDLxGraph, a novel framework that integrates Graph Retrieval Augmented Generation (Graph RAG) with LLMs, introducing HDL-specific graph representations by incorporating Abstract Syntax Trees (ASTs) and Data Flow Graphs (DFGs) to capture both code graph view and hardware graph view. HDLxGraph utilizes a dual-retrieval mechanism that not only mitigates the limited recall issues inherent in similarity-based semantic retrieval by incorporating structural information, but also enhances its extensibility to various real-world tasks by a task-specific retrieval finetuning. Additionally, to address the lack of comprehensive HDL search benchmarks, we introduce HDLSearch, a multi-granularity evaluation dataset derived from real-world repository-level projects. Experimental results demonstrate that HDLxGraph significantly improves average search accuracy, debugging efficiency and completion quality by 12.04%, 12.22% and 5.04% compared to similarity-based RAG, respectively. The code of HDLxGraph and collected HDLSearch benchmark are available at https://github.com/Nick-Zheng-Q/HDLxGraph.

Paper Structure

This paper contains 18 sections, 2 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (Top) An illustration of the mismatch between HDL and natural language in conventional RAG, including structural and vocabulary mismatches. And (Bottom) a demonstration of HDLxGraph's efficiency in bridging these mismatches by incorporating graph information, using an HDL debugging example for a CV32E40P RISC-V HDL implementation Gautschi_Near-Threshold_RISC-V_Core_2017.
  • Figure 2: The overview of our proposed HDLxGraph framework.
  • Figure 3: Visualization of an example in the graph database.
  • Figure 4: Flow of multi-level retrieval containing AST and DFG retrieval.
  • Figure 5: HDLSearch benchmark generation flow.
  • ...and 5 more figures