Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?
Romy Peled, Daniel Kroening, Michael Tautschnig, Yakir Vizel
TL;DR
This work addresses the challenge of performing inductive proofs for hardware verification with limited manual effort by introducing a neurosymbolic framework that combines two LLM prompting strategies with a formal reasoning step to extract inductive strengthenings. The approach generates candidate lemmas and uses a formal checker to assemble a safe inductive invariant for a given RTL design and property; evaluation on 94 task pairs yields a success rate of $87\%$, demonstrating practical potential. Key contributions include the dual prompting lemma-mining framework (Non-Agentic and Agentic), the FindIndStrength algorithm for selecting inductive strengthenings, and an empirical study contrasting prompting strategies and highlighting limitations such as SVA syntax gaps and data-sourcing biases. The results suggest that LLM-assisted inductive proofs can reduce manual FV workload and broaden hardware verification coverage, with future work targeting larger designs, multiple properties, and industrial deployment.
Abstract
Large Language Models (LLMs) have shown potential for solving mathematical tasks. We show that LLMs can be utilized to generate proofs by induction for hardware verification and thereby replace some of the manual work done by Formal Verification engineers and deliver industrial value. We present a neurosymbolic approach that includes two prompting frameworks to generate candidate invariants, which are checked using a formal, symbolic tool. Our results indicate that with sufficient reprompting, LLMs are able to generate inductive arguments for mid-size open-source RTL designs. For $87\%$ of our problem set, at least one of the prompt setups succeeded in producing a provably correct inductive argument.
