Automated Verification of Monotonic Data Structure Traversals in C
Matthew Sotoudeh
TL;DR
This work targets automated verification of monotonic data structure traversals (MDSTs) in C by introducing Shrinker, a verifier that exploits a scapegoating size descent technique. By running paired analyses on an input and a shrunk version, Shrinker proves safety through descent arguments, avoiding the need to track arbitrary, unbounded heap invariants common in MDSTs. The authors formalize a trace-herd abstract interpretation, present the Shrinker tool architecture and memory/numerical abstractions, and demonstrate substantial empirical gains on a large real-world MDST benchmark relative to existing tools. The results indicate that scapegoating size descent can significantly increase verification coverage for string and list traversals, with notable improvements to portfolio performance, and provide a practical path toward scalable, automated verification of C data-structure code. The work also discusses limitations and future directions, including extensions to nested MDSTs, alternative size measures, and richer memory models.
Abstract
Bespoke data structure operations are common in real-world C code. We identify one common subclass, monotonic data structure traversals (MDSTs), that iterate monotonically through the structure. For example, strlen iterates from start to end of a character array until a null byte is found, and a binary search tree insert iterates from the tree root towards a leaf. We describe a new automated verification tool, Shrinker, to verify MDSTs written in C. Shrinker uses a new program analysis strategy called scapegoating size descent, which is designed to take advantage of the fact that many MDSTs produce very similar traces when executed on an input (e.g., some large list) as when executed on a 'shrunk' version of the input (e.g., the same list but with its first element deleted). We introduce a new benchmark set containing over one hundred instances proving correctness, equivalence, and memory safety properties of dozens of MDSTs found in major C codebases including Linux, NetBSD, OpenBSD, QEMU, Git, and Musl. Shrinker significantly increases the number of monotonic string and list traversals that can be verified vs. a portfolio of state-of-the-art tools.
