Table of Contents
Fetching ...

Automated Verification of Monotonic Data Structure Traversals in C

Matthew Sotoudeh

TL;DR

This work targets automated verification of monotonic data structure traversals (MDSTs) in C by introducing Shrinker, a verifier that exploits a scapegoating size descent technique. By running paired analyses on an input and a shrunk version, Shrinker proves safety through descent arguments, avoiding the need to track arbitrary, unbounded heap invariants common in MDSTs. The authors formalize a trace-herd abstract interpretation, present the Shrinker tool architecture and memory/numerical abstractions, and demonstrate substantial empirical gains on a large real-world MDST benchmark relative to existing tools. The results indicate that scapegoating size descent can significantly increase verification coverage for string and list traversals, with notable improvements to portfolio performance, and provide a practical path toward scalable, automated verification of C data-structure code. The work also discusses limitations and future directions, including extensions to nested MDSTs, alternative size measures, and richer memory models.

Abstract

Bespoke data structure operations are common in real-world C code. We identify one common subclass, monotonic data structure traversals (MDSTs), that iterate monotonically through the structure. For example, strlen iterates from start to end of a character array until a null byte is found, and a binary search tree insert iterates from the tree root towards a leaf. We describe a new automated verification tool, Shrinker, to verify MDSTs written in C. Shrinker uses a new program analysis strategy called scapegoating size descent, which is designed to take advantage of the fact that many MDSTs produce very similar traces when executed on an input (e.g., some large list) as when executed on a 'shrunk' version of the input (e.g., the same list but with its first element deleted). We introduce a new benchmark set containing over one hundred instances proving correctness, equivalence, and memory safety properties of dozens of MDSTs found in major C codebases including Linux, NetBSD, OpenBSD, QEMU, Git, and Musl. Shrinker significantly increases the number of monotonic string and list traversals that can be verified vs. a portfolio of state-of-the-art tools.

Automated Verification of Monotonic Data Structure Traversals in C

TL;DR

This work targets automated verification of monotonic data structure traversals (MDSTs) in C by introducing Shrinker, a verifier that exploits a scapegoating size descent technique. By running paired analyses on an input and a shrunk version, Shrinker proves safety through descent arguments, avoiding the need to track arbitrary, unbounded heap invariants common in MDSTs. The authors formalize a trace-herd abstract interpretation, present the Shrinker tool architecture and memory/numerical abstractions, and demonstrate substantial empirical gains on a large real-world MDST benchmark relative to existing tools. The results indicate that scapegoating size descent can significantly increase verification coverage for string and list traversals, with notable improvements to portfolio performance, and provide a practical path toward scalable, automated verification of C data-structure code. The work also discusses limitations and future directions, including extensions to nested MDSTs, alternative size measures, and richer memory models.

Abstract

Bespoke data structure operations are common in real-world C code. We identify one common subclass, monotonic data structure traversals (MDSTs), that iterate monotonically through the structure. For example, strlen iterates from start to end of a character array until a null byte is found, and a binary search tree insert iterates from the tree root towards a leaf. We describe a new automated verification tool, Shrinker, to verify MDSTs written in C. Shrinker uses a new program analysis strategy called scapegoating size descent, which is designed to take advantage of the fact that many MDSTs produce very similar traces when executed on an input (e.g., some large list) as when executed on a 'shrunk' version of the input (e.g., the same list but with its first element deleted). We introduce a new benchmark set containing over one hundred instances proving correctness, equivalence, and memory safety properties of dozens of MDSTs found in major C codebases including Linux, NetBSD, OpenBSD, QEMU, Git, and Musl. Shrinker significantly increases the number of monotonic string and list traversals that can be verified vs. a portfolio of state-of-the-art tools.

Paper Structure

This paper contains 67 sections, 7 theorems, 1 figure, 1 table, 2 algorithms.

Key Result

lemma 1

If the algorithm returns Safe, then for any reachable trace ${\color{blue}t}$ there exists some abstract trace ${\color{red}a} \in \mathtt{seen}$ with ${\color{blue}t} \in {\color{blue}\gamma^T}({\color{red}a})$.

Figures (1)

  • Figure 1: Cactus plots. A point $(n, t)$ on the top row indicates the tool can solve $n$ of the benchmarks in $t$ total seconds. A point $(i, t)$ on the bottom row indicates the tool can solve the $i$th easiest (for it) benchmark in $t$ seconds; prefix-summing the bottom row gives the top row. In all cases, curves lower (faster) and to the right (solving more problems) are better. We also give curve corresponding to the virtual-best portfolio (i.e., assuming a perfect heuristic that picks the best solver out of the four for that instance) both with (pwith) and without (pwithout) Shrinker (for strings and trees, only one other tool solved any instances so the "portfolio without" line is identical to the other tool's curve). For both strings and lists, Shrinker on its own always solves more instances than any other tool, is within the same order of magnitude of time as the other tools (sometimes faster), and leads to significant improvements in the portfolio performance. For trees, Shrinker is considerably slower than the best tool (2ls), but its inclusion in the portfolio results in solving one additional benchmark.

Theorems & Definitions (24)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • lemma 1
  • proof
  • theorem 1
  • proof
  • definition 5
  • theorem 2
  • ...and 14 more