A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation
Enrique Dehaerne, Bappaditya Dey, Sandip Halder, Stefan De Gendt
TL;DR
The paper targets inefficiencies in Verilog development for electronic design automation by introducing a deep-learning framework that pretrains transformer-based models on large general programming-language data and fine-tunes them on a carefully curated Verilog corpus. A novel Verilog dataset is created with file-, snippet-, and labeled-definition-body subsets, along with rigorous filtering and deduplication. Experiments show that pretraining on broad PL data followed by task-focused Verilog finetuning yields substantial gains in perplexity and snippet-level generation metrics (BLEU, ROUGE-L, chrF) over scratch training, with the best results achieved by fine-tuning a mono CodeGen checkpoint on Verilog snippets. The work demonstrates the practicality of Verilog autocompletion and points toward broader downstream EDA automation, including test benches and layout generation, using similar data-curation and pretraining strategies.
Abstract
Innovative Electronic Design Automation (EDA) solutions are important to meet the design requirements for increasingly complex electronic devices. Verilog, a hardware description language, is widely used for the design and verification of digital circuits and is synthesized using specific EDA tools. However, writing code is a repetitive and time-intensive task. This paper proposes, primarily, a novel deep learning framework for training a Verilog autocompletion model and, secondarily, a Verilog dataset of files and snippets obtained from open-source repositories. The framework involves integrating models pretrained on general programming language data and finetuning them on a dataset curated to be similar to a target downstream task. This is validated by comparing different pretrained models trained on different subsets of the proposed Verilog dataset using multiple evaluation metrics. These experiments demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from scratch. Code and data are made available at: https://github.com/99EnriqueD/verilog_autocompletion .
