Table of Contents
Fetching ...

A Neurosymbolic Approach to Loop Invariant Generation via Weakest Precondition Reasoning

Daragh King, Vasileios Koutavas, Laura Kovacs

TL;DR

Loop invariant generation is a major bottleneck in automated program verification. NeuroInv couples a neural reasoning module that uses weakest precondition backward chaining with a verification-guided symbolic module that repairs invariants via counterexamples from OpenJML, forming a robust neurosymbolic loop. Across 150 Java programs, NeuroInv achieves a 99.5% success rate, significantly outperforming ad-hoc prompting and SpecGen, and scales to a harder benchmark with multi-loop programs. The dual-level refinement and repair demonstrate that integrating WP reasoning with formal verification substantially improves robustness and scalability in LLM-assisted invariant generation.

Abstract

Loop invariant generation remains a critical bottleneck in automated program verification. Recent work has begun to explore the use of Large Language Models (LLMs) in this area, yet these approaches tend to lack a reliable and structured methodology, with little reference to existing program verification theory. This paper presents NeuroInv, a neurosymbolic approach to loop invariant generation. NeuroInv comprises two key modules: (1) a neural reasoning module that leverages LLMs and Hoare logic to derive and refine candidate invariants via backward-chaining weakest precondition reasoning, and (2) a verification-guided symbolic module that iteratively repairs invariants using counterexamples from OpenJML. We evaluate NeuroInv on a comprehensive benchmark of 150 Java programs, encompassing single and multiple (sequential) loops, multiple arrays, random branching, and noisy code segments. NeuroInv achieves a $99.5\%$ success rate, substantially outperforming the other evaluated approaches. Additionally, we introduce a hard benchmark of $10$ larger multi-loop programs (with an average of $7$ loops each); NeuroInv's performance in this setting demonstrates that it can scale to more complex verification scenarios.

A Neurosymbolic Approach to Loop Invariant Generation via Weakest Precondition Reasoning

TL;DR

Loop invariant generation is a major bottleneck in automated program verification. NeuroInv couples a neural reasoning module that uses weakest precondition backward chaining with a verification-guided symbolic module that repairs invariants via counterexamples from OpenJML, forming a robust neurosymbolic loop. Across 150 Java programs, NeuroInv achieves a 99.5% success rate, significantly outperforming ad-hoc prompting and SpecGen, and scales to a harder benchmark with multi-loop programs. The dual-level refinement and repair demonstrate that integrating WP reasoning with formal verification substantially improves robustness and scalability in LLM-assisted invariant generation.

Abstract

Loop invariant generation remains a critical bottleneck in automated program verification. Recent work has begun to explore the use of Large Language Models (LLMs) in this area, yet these approaches tend to lack a reliable and structured methodology, with little reference to existing program verification theory. This paper presents NeuroInv, a neurosymbolic approach to loop invariant generation. NeuroInv comprises two key modules: (1) a neural reasoning module that leverages LLMs and Hoare logic to derive and refine candidate invariants via backward-chaining weakest precondition reasoning, and (2) a verification-guided symbolic module that iteratively repairs invariants using counterexamples from OpenJML. We evaluate NeuroInv on a comprehensive benchmark of 150 Java programs, encompassing single and multiple (sequential) loops, multiple arrays, random branching, and noisy code segments. NeuroInv achieves a success rate, substantially outperforming the other evaluated approaches. Additionally, we introduce a hard benchmark of larger multi-loop programs (with an average of loops each); NeuroInv's performance in this setting demonstrates that it can scale to more complex verification scenarios.

Paper Structure

This paper contains 26 sections, 4 equations, 5 figures, 4 tables, 3 algorithms.

Figures (5)

  • Figure 1: High-level architecture of NeuroInv; $M$ is the input program, $M'$ is the input program with the neurally-derived loop invariants, and $M"$ represents $M$ with the correct loop invariants as JML annotations.
  • Figure 2: The two left-most entities depict the link types used within the neural module's backward chaining WP reasoning. The entity on the right shows the chain between a "Loop-Free" and "Loop" link, for a hypothetical program segment which is comprised of a while loop followed by loop-free code.
  • Figure 3: Average success rates (with std. deviation bars), obtained across 5 experimental runs.
  • Figure 4: The proportions of programs which do (and do not) require loop invariant refinements across each experimental run (solid lines). Additionally, the dashed line depicts the total number of refinements required per experimental run.
  • Figure 5: The proportions of refinements induced by Implication 1 and Implication 2 failures, evaluated across 5 experimental runs.

Theorems & Definitions (6)

  • Definition 1: Hoare Triple (Partial Correctness)
  • Definition 2: Weakest Precondition
  • Definition 3: Initialisation
  • Definition 4: WP of Sequence
  • Definition 5: Preservation
  • Definition 6: Exit