DeepV: A Model-Agnostic Retrieval-Augmented Framework for Verilog Code Generation with a High-Quality Knowledge Base
Zahin Ibnat, Paul E. Calzada, Rasin Mohammed Ihtemam, Sujan Kumar Saha, Jingbo Zhou, Farimah Farahmandi, Mark Tehranipoor
TL;DR
This work addresses the challenge of generating high-quality RTL Verilog code with LLMs by proposing DeepV, a model-agnostic retrieval-augmented framework that grounds generation in a large, curated Verilog database (VerilogDB) via FAISS-based dynamic retrieval. By augmenting prompts with carefully retrieved, syntactically correct and synthesizable examples, DeepV significantly improves syntax and functional correctness across diverse back-end LLMs and outperforms state-of-the-art fine-tuned RTL generators on VerilogEval. The approach reduces the need for costly fine-tuning, enables continuous knowledge-base updates, and provides a public Hugging Face Space for broad accessibility and reproducibility. Together, these contributions advance practical, scalable RTL design automation by leveraging high-quality external knowledge and intelligent retrieval to enhance general-purpose LLMs.
Abstract
As large language models (LLMs) continue to be integrated into modern technology, there has been an increased push towards code generation applications, which also naturally extends to hardware design automation. LLM-based solutions for register transfer level (RTL) code generation for intellectual property (IP) designs have grown, especially with fine-tuned LLMs, prompt engineering, and agentic approaches becoming popular in literature. However, a gap has been exposed in these techniques, as they fail to integrate novel IPs into the model's knowledge base, subsequently resulting in poorly generated code. Additionally, as general-purpose LLMs continue to improve, fine-tuned methods on older models will not be able to compete to produce more accurate and efficient designs. Although some retrieval augmented generation (RAG) techniques exist to mitigate challenges presented in fine-tuning approaches, works tend to leverage low-quality codebases, incorporate computationally expensive fine-tuning in the frameworks, or do not use RAG directly in the RTL generation step. In this work, we introduce DeepV: a model-agnostic RAG framework to generate RTL designs by enhancing context through a large, high-quality dataset without any RTL-specific training. Our framework benefits the latest commercial LLM, OpenAI's GPT-5, with a near 17% increase in performance on the VerilogEval benchmark. We host DeepV for use by the community in a Hugging Face (HF) Space: https://huggingface.co/spaces/FICS-LLM/DeepV.
