TB or Not TB: Coverage-Driven Direct Preference Optimization for Verilog Stimulus Generation
Bardia Nadimi, Khashayar Filom, Deming Chen, Hao Zheng
TL;DR
TB or not TB introduces a coverage-driven Direct Preference Optimization (CD-DPO) framework for automated Verilog stimulus generation. By building PairaNet from PyraNet and fine-tuning Qwen-based generators with offline, coverage-weighted pairwise preferences, the approach aligns testbench generation with quantitative verification goals. Evaluated on CVDP CID12 with Riviera-Pro, TB or not TB achieves up to $77.27\%$ code-coverage gains over open-source baselines and up to $56.78\%$ improvements over a commercial model in best-of-20 results, illustrating the potential of offline, coverage-driven preference learning for hardware verification. The work highlights that larger models benefit more from CD-DPO and demonstrates a scalable, simulator-free training paradigm that reduces reliance on online reinforcement learning while improving verification quality.
Abstract
With the rapid advancement of Large Language Models (LLMs), there is growing interest in applying them to hardware design and verification. Among these stages, design verification remains the most time-consuming and resource-intensive phase, where generating effective stimuli for the design under test (DUT) is both critical and labor-intensive. We present {\it TB or not TB}, a framework for automated stimulus generation using LLMs fine-tuned through Coverage-Driven Direct Preference Optimization (CD-DPO). To enable preference-based training, we introduce PairaNet, a dataset derived from PyraNet that pairs high- and low-quality testbenches labeled using simulation-derived coverage metrics. The proposed CD-DPO method integrates quantitative coverage feedback directly into the optimization objective, guiding the model toward generating stimuli that maximize verification coverage. Experiments on the CVDP CID12 benchmark show that {\it TB or not TB} outperforms both open-source and commercial baselines, achieving up to 77.27\% improvement in code coverage, demonstrating the effectiveness of Coverage-driven preference optimization for LLM-based hardware verification.
