Table of Contents
Fetching ...

IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

Mingkai Miao, Guangyu Hu, Ziyi Yang, Hongce Zhang

Abstract

IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof to safety. In practice, the performance of IC3 is dominated by a large web of interacting heuristics and implementation choices, making manual tuning costly, brittle, and hard to reproduce. This paper presents IC3-Evolve, an automated offline code-evolution framework that utilizes an LLM to propose small, slot-restricted and auditable patches to an IC3 implementation. Crucially, every candidate patch is admitted only through proof- /witness-gated validation: SAFE runs must emit a certificate that is independently checked, and UNSAFE runs must emit a replayable counterexample trace, preventing unsound edits from being deployed. Since the LLM is used only offline, the deployed artifact is a standalone evolved checker with zero ML/LLM inference overhead and no runtime model dependency. We evolve on the public hardware model checking competition (HWMCC) benchmark and evaluate the generalizability on unseen public and industrial model checking benchmarks, showing that IC3-Evolve can reliably discover practical heuristic improvements under strict correctness gates.

IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

Abstract

IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof to safety. In practice, the performance of IC3 is dominated by a large web of interacting heuristics and implementation choices, making manual tuning costly, brittle, and hard to reproduce. This paper presents IC3-Evolve, an automated offline code-evolution framework that utilizes an LLM to propose small, slot-restricted and auditable patches to an IC3 implementation. Crucially, every candidate patch is admitted only through proof- /witness-gated validation: SAFE runs must emit a certificate that is independently checked, and UNSAFE runs must emit a replayable counterexample trace, preventing unsound edits from being deployed. Since the LLM is used only offline, the deployed artifact is a standalone evolved checker with zero ML/LLM inference overhead and no runtime model dependency. We evolve on the public hardware model checking competition (HWMCC) benchmark and evaluate the generalizability on unseen public and industrial model checking benchmarks, showing that IC3-Evolve can reliably discover practical heuristic improvements under strict correctness gates.

Paper Structure

This paper contains 41 sections, 3 figures, 4 tables, 3 algorithms.

Figures (3)

  • Figure 1: Comparison of learning-augmented IC3, LLM-in-the-loop verification, and our approach. IC3-Evolve evolves IC3 code offline and promotes edits only via proof-/witness-gated validation, yielding a standalone checker with no runtime model dependency.
  • Figure 2: Overview of IC3-Evolve. Compass&Jump selects slot(s); a programmer proposes a slot-restricted patch (with hypothesis and fallback). A tool-using evaluator builds and witness-validates the challenger, benchmarks it (e.g., PAR2/solved), and returns a MoveSet. Challengers are promoted only if they pass hard gates and improve under a regression budget. See agent prompt example in \ref{['app:prompts']} in Appendix.
  • Figure 3: Representative evolution trace on the HWMCC optimization workload. Bars: attempted challengers; line: best-so-far PAR2 (lower is better). Green bars: accepted new best; shading: editable slot(s) per phase.