Table of Contents
Fetching ...

Are LLMs Ready for Practical Adoption for Assertion Generation?

Vaishnavi Pulavarthi, Deeksha Nandal, Soham Dan, Debjit Pal

TL;DR

The paper tackles the readiness of LLMs for practical hardware assertion generation by introducing AssertionBench, a 100-design Verilog benchmark, and proposing AssertionLLM, a fine-tuned LLM for improved assertion correctness. It formalizes assertions as $P = \mathcal{G}(A \\rightarrow C)$ with $A = \\bigwedge_{i=0}^m \\mathcal{X}^i(A^i)$ and $C = \\mathcal{X}^n(C^n)$, $n \\ge m$, and demonstrates that COTS LLMs produce many syntactic/semantic errors. Finetuning (from CodeLLaMa 2 and LLaMa3-70B) significantly boosts correctness (e.g., CodeLLaMa 2 shows up to +29% and +38% proven assertions and substantial CEX reductions), though syntactic errors persist, underscoring the need for richer HDL understanding and auxiliary design representations. The work provides a clear path toward practical AI-assisted assertion generation and outlines key directions for extending coverage, exploiting design artifacts, and addressing security considerations to accelerate industrial adoption.

Abstract

Assertions have been the de facto collateral for simulation-based and formal verification of hardware designs for over a decade. The quality of hardware verification, i.e., detection and diagnosis of corner-case design bugs, is critically dependent on the quality of the assertions. With the onset of generative AI such as Transformers and Large-Language Models (LLMs), there has been a renewed interest in developing novel, effective, and scalable techniques of generating functional and security assertions from design source code. While there have been recent works that use commercial-of-the-shelf (COTS) LLMs for assertion generation, there is no comprehensive study in quantifying the effectiveness of LLMs in generating syntactically and semantically correct assertions. In this paper, we first discuss AssertionBench from our prior work, a comprehensive set of designs and assertions to quantify the goodness of a broad spectrum of COTS LLMs for the task of assertion generations from hardware design source code. Our key insight was that COTS LLMs are not yet ready for prime-time adoption for assertion generation as they generate a considerable fraction of syntactically and semantically incorrect assertions. Motivated by the insight, we propose AssertionLLM, a first of its kind LLM model, specifically fine-tuned for assertion generation. Our initial experimental results show that AssertionLLM considerably improves the semantic and syntactic correctness of the generated assertions over COTS LLMs.

Are LLMs Ready for Practical Adoption for Assertion Generation?

TL;DR

The paper tackles the readiness of LLMs for practical hardware assertion generation by introducing AssertionBench, a 100-design Verilog benchmark, and proposing AssertionLLM, a fine-tuned LLM for improved assertion correctness. It formalizes assertions as with and , , and demonstrates that COTS LLMs produce many syntactic/semantic errors. Finetuning (from CodeLLaMa 2 and LLaMa3-70B) significantly boosts correctness (e.g., CodeLLaMa 2 shows up to +29% and +38% proven assertions and substantial CEX reductions), though syntactic errors persist, underscoring the need for richer HDL understanding and auxiliary design representations. The work provides a clear path toward practical AI-assisted assertion generation and outlines key directions for extending coverage, exploiting design artifacts, and addressing security considerations to accelerate industrial adoption.

Abstract

Assertions have been the de facto collateral for simulation-based and formal verification of hardware designs for over a decade. The quality of hardware verification, i.e., detection and diagnosis of corner-case design bugs, is critically dependent on the quality of the assertions. With the onset of generative AI such as Transformers and Large-Language Models (LLMs), there has been a renewed interest in developing novel, effective, and scalable techniques of generating functional and security assertions from design source code. While there have been recent works that use commercial-of-the-shelf (COTS) LLMs for assertion generation, there is no comprehensive study in quantifying the effectiveness of LLMs in generating syntactically and semantically correct assertions. In this paper, we first discuss AssertionBench from our prior work, a comprehensive set of designs and assertions to quantify the goodness of a broad spectrum of COTS LLMs for the task of assertion generations from hardware design source code. Our key insight was that COTS LLMs are not yet ready for prime-time adoption for assertion generation as they generate a considerable fraction of syntactically and semantically incorrect assertions. Motivated by the insight, we propose AssertionLLM, a first of its kind LLM model, specifically fine-tuned for assertion generation. Our initial experimental results show that AssertionLLM considerably improves the semantic and syntactic correctness of the generated assertions over COTS LLMs.

Paper Structure

This paper contains 12 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: A Verilog code for a 2-port Arbiterpal2020tcad.
  • Figure 2: Assertion status based on pre-condition and post-condition evaluation. CEX: Counter example.
  • Figure 3: Design details in the test set in terms of the number of lines of code (excluding comments and blank lines).
  • Figure 4: Framework to evaluate LLMs for assertion generationpulavarthi2025naacl. JG: JasperGold Formal Property Verification Engine.
  • Figure 5: An example of the prompt for 1-shot learningpulavarthi2025naacl. The example consists of a tuple, a Verilog design ( Program 1) and a set of formally verified assertions for the design ( Assertions 1). The Test Program is the Verilog design for which we generate assertions using the trained LLM.
  • ...and 4 more figures