Table of Contents
Fetching ...

VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning

Zhuorui Zhao, Bing Li, Grace Li Zhang, Ulf Schlichtmann

TL;DR

This paper tackles the challenge of functional correctness in Verilog code generated by large language models. It introduces VFocus, a training-free framework that sharpens reasoning through three stages: (i) pre-ranking with Density-guided Filtering to select focused candidates, (ii) simulation-based self-consistency ranking to identify behaviorally stable solutions, and (iii) post-ranking contradiction mining with reasoning-augmented refinement to correct edge cases. Empirical results on the VerilogEval-Human benchmark show significant pass@1 improvements across multiple reasoning LLMs, including both open-source and proprietary models, and demonstrate robustness across varying sample sizes. By removing reliance on human-written testbenches and leveraging structured reasoning at decision points, VFocus offers scalable enhancements for AI-assisted hardware design automation.

Abstract

Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge. Existing approaches often rely on self-consistency or simulation feedback to select the best candidate, but they miss opportunities to focus LLM reasoning on the most informative parts of the design. We propose VFocus, a three-stage framework that enhances Verilog generation by sharpening the focus of LLM reasoning onto critical decision points in the code generation process. In the \textbf{pre-ranking stage}, VFocus generates multiple code candidates through LLM prompting, retries for syntactically valid outputs, and introduces a \textit{Density-guided Filtering} to retain candidates that fall within the "reasoning sweet spot" for functional correctness. In the \textbf{ranking stage}, we simulate each code candidate using an automatically generated testbench and apply self-consistency-based clustering to identify the most consistent outputs. Finally, in the \textbf{post-ranking refinement stage}, VFocus performs inconsistency mining on top-ranked candidates and invokes reasoning-augmented LLM prompts for candidate refinement. Experiments on the VerilogEval-Human benchmark show that VFocus significantly improves the pass@1 correctness across multiple reasoning LLMs, demonstrating its effectiveness in enhancing Verilog generation for complex hardware design tasks.

VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning

TL;DR

This paper tackles the challenge of functional correctness in Verilog code generated by large language models. It introduces VFocus, a training-free framework that sharpens reasoning through three stages: (i) pre-ranking with Density-guided Filtering to select focused candidates, (ii) simulation-based self-consistency ranking to identify behaviorally stable solutions, and (iii) post-ranking contradiction mining with reasoning-augmented refinement to correct edge cases. Empirical results on the VerilogEval-Human benchmark show significant pass@1 improvements across multiple reasoning LLMs, including both open-source and proprietary models, and demonstrate robustness across varying sample sizes. By removing reliance on human-written testbenches and leveraging structured reasoning at decision points, VFocus offers scalable enhancements for AI-assisted hardware design automation.

Abstract

Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge. Existing approaches often rely on self-consistency or simulation feedback to select the best candidate, but they miss opportunities to focus LLM reasoning on the most informative parts of the design. We propose VFocus, a three-stage framework that enhances Verilog generation by sharpening the focus of LLM reasoning onto critical decision points in the code generation process. In the \textbf{pre-ranking stage}, VFocus generates multiple code candidates through LLM prompting, retries for syntactically valid outputs, and introduces a \textit{Density-guided Filtering} to retain candidates that fall within the "reasoning sweet spot" for functional correctness. In the \textbf{ranking stage}, we simulate each code candidate using an automatically generated testbench and apply self-consistency-based clustering to identify the most consistent outputs. Finally, in the \textbf{post-ranking refinement stage}, VFocus performs inconsistency mining on top-ranked candidates and invokes reasoning-augmented LLM prompts for candidate refinement. Experiments on the VerilogEval-Human benchmark show that VFocus significantly improves the pass@1 correctness across multiple reasoning LLMs, demonstrating its effectiveness in enhancing Verilog generation for complex hardware design tasks.

Paper Structure

This paper contains 16 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Current training-free approaches: (a) paradigm/prompt engineering, (b) golden-testbench feedback, (c) self-consistency.
  • Figure 2: Overall framework of VFocus.
  • Figure 3: Functional correctness as number of samples increases across different models
  • Figure 4: Functional correctness increase as # Samples increase