Bottom-Up Generation of Verilog Designs for Testing EDA Tools
João Victor Amorim Vieira, Luiza de Melo Gomes, Rafael Sumitani, Raissa Maciel, Augusto Mafra, Mirlaine Crepalde, Fernando Magno Quintão Pereira
TL;DR
The paper tackles the scarcity of Verilog benchmarks for testing EDA tools and training Verilog-aware LLMs by introducing ChiGen, a bottom-up fuzzer that generates realistic Verilog designs. ChiGen constructs a skeleton via a Probabilistic Context-Free Grammar trained on real designs, then refines it through scope-aware renaming, Hindley-Milner type inference, and interactive code injection guided by reaching-definition analysis. Key contributions include the ChiBench training dataset, a context-sensitive grammar up to $K$ contexts, a two-stage type inference engine, and an injection mechanism that scales designs while preserving validity. Empirically, ChiGen achieves higher syntactic diversity and code coverage than existing fuzzers, uncovers numerous tool bugs, and demonstrates efficient throughput, making it a practical tool for robust EDA testing and benchmark generation.
Abstract
Testing Electronic Design Automation (EDA) tools rely on benchmarks -- designs written in Hardware Description Languages (HDLs) such as Verilog, SystemVerilog, or VHDL. Although collections of benchmarks for these languages exist, they are typically limited in size. This scarcity has recently drawn more attention due to the increasing need for training large language models in this domain. To deal with such limitation, this paper presents a methodology and a corresponding tool for generating realistic Verilog designs. The tool, ChiGen, was originally developed to test the Jasper\textregistered\ Formal Verification Platform, a product by Cadence Design Systems. Now, released as open-source software, ChiGen has been able to identify zero-day bugs in a range of tools, including Verible, Verilator, and Yosys. This paper outlines the principles behind ChiGen's design, focusing on three aspects of it: (i) generation guided by probabilistic grammars, (ii) type inference via the Hindley-Milner algorithm, and (iii) code injection enabled by data-flow analysis. Once deployed on standard hardware, ChiGen outperforms existing Verilog Fuzzers such as Verismith, TransFuzz, and VlogHammer regarding structural diversity, code coverage, and bug-finding ability.
