QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Yutong Wu; Chenrui Cao; Pengwei Jin; Di Huang; Rui Zhang; Xishan Zhang; Zidong Du; Qi Guo; Xing Hu

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

Abstract

SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Abstract

Paper Structure (15 sections, 3 equations, 4 figures, 5 tables)

This paper contains 15 sections, 3 equations, 4 figures, 5 tables.

Introduction
Related Work
Methods
SVA Synthesis from Real-World RTL Code
Bidirectional Selection for NL-SVA Pairs
Further Data Quality Refinement
Supervised Fine-Tuning
Experiments
Implementation Details
Experimental Settings
Main Results
Ablation Studies
CodeV-SVA in End-to-End Verification
Conclusion
Acknowledgements

Figures (4)

Figure 1: The relationship between data size and average 3-gram TF-IDF distance of different SVA sources.
Figure 2: An example of bidirectional data selection.
Figure 3: The overview of our data synthesis and training framework.
Figure 4: Concise prompts and examples of NL verification property analysis (Sec. \ref{['sec:method_sva_synthesis']}) and SVA2NL (Sec. \ref{['sec:method_bidirectional']}).

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Abstract

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Authors

Abstract

Table of Contents

Figures (4)