Table of Contents
Fetching ...

Sound and Complete Invariant-Based Heap Encodings (Technical Report)

Zafer Esen, Philipp Rümmer, Tjark Weber

TL;DR

This work tackles the verification of programs that manipulate mutable, heap-allocated data by introducing a relational heap encoding that eliminates the heap and reduces verification to integer-based constrained Horn clauses with uninterpreted predicates and prophecy variables. It presents two complete, sound encodings, Enc_R and Enc_RW, along with formal proofs and extensions for scalability, and demonstrates their effectiveness on SV-COMP-like benchmarks using CHC back-ends such as Sea-Horn and Tri-Cera. The approach enables automatic verification of previously unreachable heap-manipulating programs, without manual heap annotations, by translating heap behavior into logical relations over program inputs and read/write indices. The framework supports refinements (meta-information, scope variables) and property-tailored variants, offering a flexible, equi-safe foundation for advancing heap verification and guiding future extensions to arrays, concurrency, and more complex data structures.

Abstract

Verification of programs operating on mutable, heap-allocated data structures poses significant challenges due to potentially unbounded structures like linked lists and trees. In this paper, we present a novel relational heap encoding leveraging uninterpreted predicates and prophecy variables, reducing heap verification tasks to satisfiability checks over integers in constrained Horn clauses (CHCs). To the best of our knowledge, our approach is the first invariant-based method that achieves both soundness and completeness for heap-manipulating programs. We provide formal proofs establishing the correctness of our encodings. Through an experimental evaluation we demonstrate that our method significantly extends the capability of existing CHC-based verification tools, allowing automatic verification of programs with heap previously unreachable by state-of-the-art tools.

Sound and Complete Invariant-Based Heap Encodings (Technical Report)

TL;DR

This work tackles the verification of programs that manipulate mutable, heap-allocated data by introducing a relational heap encoding that eliminates the heap and reduces verification to integer-based constrained Horn clauses with uninterpreted predicates and prophecy variables. It presents two complete, sound encodings, Enc_R and Enc_RW, along with formal proofs and extensions for scalability, and demonstrates their effectiveness on SV-COMP-like benchmarks using CHC back-ends such as Sea-Horn and Tri-Cera. The approach enables automatic verification of previously unreachable heap-manipulating programs, without manual heap annotations, by translating heap behavior into logical relations over program inputs and read/write indices. The framework supports refinements (meta-information, scope variables) and property-tailored variants, offering a flexible, equi-safe foundation for advancing heap verification and guiding future extensions to arrays, concurrency, and more complex data structures.

Abstract

Verification of programs operating on mutable, heap-allocated data structures poses significant challenges due to potentially unbounded structures like linked lists and trees. In this paper, we present a novel relational heap encoding leveraging uninterpreted predicates and prophecy variables, reducing heap verification tasks to satisfiability checks over integers in constrained Horn clauses (CHCs). To the best of our knowledge, our approach is the first invariant-based method that achieves both soundness and completeness for heap-manipulating programs. We provide formal proofs establishing the correctness of our encodings. Through an experimental evaluation we demonstrate that our method significantly extends the capability of existing CHC-based verification tools, allowing automatic verification of programs with heap previously unreachable by state-of-the-art tools.

Paper Structure

This paper contains 35 sections, 7 theorems, 10 equations, 4 figures, 6 tables.

Key Result

lemma 1

For any program $p$, $T_p$ is monotonic: if $\mathcal{I}_1 \sqsubseteq \mathcal{I}_2$, then $T_p(\mathcal{I}_1) \sqsubseteq T_p(\mathcal{I}_2)$.

Figures (4)

  • Figure 1: The final shape of the linked list created in Listing \ref{['lst:mot-program-1']} for positive N. For non-positive N, the shape will contain only the last node with the value $3$.
  • Figure 2: A sound and complete encoding for heap operations using traces (\ref{['lst:trace-encoding']}) and using uninterpreted predicates (\ref{['lst:full-encoding']}). The program from \ref{['lst:mot-program-1']} is rewritten to be compatible with either encoding and is given in \ref{['lst:common-code']}.
  • Figure 3: For the program in \ref{['lst:mot-program-1']}, the upper part of the diagram illustrates the heap trace for N = 3. In the trace each cell corresponds to a write to the written (address -- Node) pair. The Nodes are represented by the rounded rectangles containing the values for the data and next fields respectively, with the number above a node showing that node's address. The lower part of the diagram shows the relationship between the trace-based and the relational heap encodings. In the relational heap encoding, last_addr implicitly quantifies over all possible addresses. When last_addr matches the address being read, the last Node value residing at that address is registered in the computed solution of the uninterpreted predicate R, represented by the arrows in the diagram. Due to the universal quantification, R will contain the union of these values. The reads can be distinguished due to the unique counter value (in the diagram the numbers appended to R#). The reads R#1 -- R#3 happen at line 31 in \ref{['lst:common-code']}, and the rest of the reads happen at line 37.
  • Figure 4: The syntax of UPLang. The language is deterministic, supports uninterpreted predicates that can be used inside $\textbf{assert}$ and $\textbf{assume}$ statements. Pointer arithmetic (i.e., arithmetic manipulation of $\mathit{Addr}$ variables) is not permitted.

Theorems & Definitions (11)

  • definition 1: Program Execution
  • definition 2: Safety
  • definition 3: Equi-safety
  • definition 4: Immediate Consequence Operator
  • lemma 1: Monotonicity of $T$
  • lemma 2: Safety under $\mathcal{I}^*$
  • lemma 3: Functional consistency of $\mathcal{I}^*(R)$
  • lemma 4: $\mathit{Enc}_{n}$ is correct
  • lemma 5: Preservation of final states by $\mathit{Enc}_{R}$
  • theorem 1: $\mathit{Enc}_{R}$ is correct
  • ...and 1 more