Table of Contents
Fetching ...

QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression

Lei Huang, Rui Zhang, Jiaming Guo, Yang Zhang, Di Huang, Shuyao Cheng, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

TL;DR

QiMeng-CRUX addresses the NL-to-Verilog gap by introducing CRUX, a structured intermediate space comprising Module Interface, Core Functions, and Key Considerations. A two-stage framework—Joint Expression Modeling (NL→CRUX/Verilog via RealSpec and SFT) and Dual-Space Optimization (CRUX-Enhanced GRPO with CRUX-Reward and Code-Reward)—drives accurate HDL generation. Empirical results on VerilogEval and RTLLM benchmarks show state-of-the-art performance and robust transferability when CRUX is used as prompts for other models. The work demonstrates that semantically structured guidance can substantially improve synthesizable Verilog generation and generalize across tasks and models.

Abstract

Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation. However, existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured, which poses significant challenges for downstream Verilog code generation. We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space. To bridge this gap, we introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation. We further design a two-stage training framework, comprising Joint Expression Modeling and Dual-Space Optimization, to enhance the quality of both CRUX and Verilog code. Experiments across multiple Verilog generation benchmarks demonstrate that our model, CRUX-V, achieves state-of-the-art performance among general models, particularly under challenging design tasks. Furthermore, the CRUX space proves transferable and beneficial when used as input prompts for other code models, highlighting its effectiveness in narrowing the gap between free-form natural language descriptions and precise Verilog generation.

QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression

TL;DR

QiMeng-CRUX addresses the NL-to-Verilog gap by introducing CRUX, a structured intermediate space comprising Module Interface, Core Functions, and Key Considerations. A two-stage framework—Joint Expression Modeling (NL→CRUX/Verilog via RealSpec and SFT) and Dual-Space Optimization (CRUX-Enhanced GRPO with CRUX-Reward and Code-Reward)—drives accurate HDL generation. Empirical results on VerilogEval and RTLLM benchmarks show state-of-the-art performance and robust transferability when CRUX is used as prompts for other models. The work demonstrates that semantically structured guidance can substantially improve synthesizable Verilog generation and generalize across tasks and models.

Abstract

Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation. However, existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured, which poses significant challenges for downstream Verilog code generation. We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space. To bridge this gap, we introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation. We further design a two-stage training framework, comprising Joint Expression Modeling and Dual-Space Optimization, to enhance the quality of both CRUX and Verilog code. Experiments across multiple Verilog generation benchmarks demonstrate that our model, CRUX-V, achieves state-of-the-art performance among general models, particularly under challenging design tasks. Furthermore, the CRUX space proves transferable and beneficial when used as input prompts for other code models, highlighting its effectiveness in narrowing the gap between free-form natural language descriptions and precise Verilog generation.

Paper Structure

This paper contains 46 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Descriptions of the Code-Completion (CC) and Spec-to-RTL (SR) tasks in the VerilogEval-v2 benchmark convey equivalent design intent, but differ in structure. The top portion shows an example, while the bottom presents the performance of several general-purpose and domain-specific code LLMs on the whole benchmark. Results show that expression structure significantly impacts model performance, with SR-Recon consistently outperforming SR, and CC benefiting CodeVs under more constrained input formats.
  • Figure 2: An example from the VerilogEval-v2 Spec-to-RTL verilogeval-v2 benchmark shows that the original description lacks explicit emphasis on critical design details. Feeding this directly to LLMs often leads to incorrect implementations due to misinterpretation of key details. While the model possesses the capability to generate correct code, the ambiguity and underspecification in the input hinder accurate realization. In contrast, providing the model with CRUX enables more reliable alignment with the desired design specification.
  • Figure 3: Overview of the two-stage training process. Stage I involves dataset categorization and reconstruction, which is used for supervised fine-tuning (SFT). Stage II applies CRUX-Enhanced GRPO for RL-based post-training.
  • Figure 4: We apply different processing pipelines to the three categories. RealSpec uses prefix/suffix augmentation and interface degradation for variation, while CRUX is constructed mainly via LLMs.
  • Figure 5: Using CRUX alone (crux_only) already leads to notable gains compared to using the original specification directly. Further combining CRUX with the original descriptions (des_with_crux) yields the best performance, even outperforming the Code-Completion task in most cases.
  • ...and 3 more figures