Table of Contents
Fetching ...

Do Unit Proofs Work? An Empirical Study of Compositional Bounded Model Checking for Memory Safety Verification

Paschal C. Amusuo, Owen Cochell, Taylor Le Lievre, Parth V. Patil, Aravind Machiry, James C. Davis

TL;DR

The paper investigates unit proofs for memory safety verification via compositional bounded model checking and provides the first empirical study of a uniform, systematic unit-proofing method. By developing 73 unit proofs across four embedded OSes and reintroducing 89 defects, the study shows systematic unit proofs exposed 74% of defects and uncovered 19 new ones, with average development times around 87 minutes and verification times around 61 seconds. The approach emphasizes angelic modeling and assume-guarantee reasoning to model unknowns, with completion guided by objective criteria including coverage and termination. The findings offer practical guidance for engineers and inform tooling design, supporting incremental adoption of unit proofs in development workflows and highlighting pathways for automation. Overall, unit proofing emerges as a cost-effective, scalable strategy for memory-safety verification with tangible defect-detection benefits in real-world embedded software.

Abstract

Memory safety defects pose a major threat to software reliability, enabling cyberattacks, outages, and crashes. To mitigate these risks, organizations adopt Compositional Bounded Model Checking (BMC), using unit proofs to formally verify memory safety. However, methods for creating unit proofs vary across organizations and are inconsistent within the same project, leading to errors and missed defects. In addition, unit proofing remains understudied, with no systematic development methods or empirical evaluations. This work presents the first empirical study on unit proofing for memory safety verification. We introduce a systematic method for creating unit proofs that leverages verification feedback and objective criteria. Using this approach, we develop 73 unit proofs for four embedded operating systems and evaluate their effectiveness, characteristics, cost, and generalizability. Our results show unit proofs are cost-effective, detecting 74\% of recreated defects, with an additional 9\% found with increased BMC bounds, and 19 new defects exposed. We also found that embedded software requires small unit proofs, which can be developed in 87 minutes and executed in 61 minutes on average. These findings provide practical guidance for engineers and empirical data to inform tooling design.

Do Unit Proofs Work? An Empirical Study of Compositional Bounded Model Checking for Memory Safety Verification

TL;DR

The paper investigates unit proofs for memory safety verification via compositional bounded model checking and provides the first empirical study of a uniform, systematic unit-proofing method. By developing 73 unit proofs across four embedded OSes and reintroducing 89 defects, the study shows systematic unit proofs exposed 74% of defects and uncovered 19 new ones, with average development times around 87 minutes and verification times around 61 seconds. The approach emphasizes angelic modeling and assume-guarantee reasoning to model unknowns, with completion guided by objective criteria including coverage and termination. The findings offer practical guidance for engineers and inform tooling design, supporting incremental adoption of unit proofs in development workflows and highlighting pathways for automation. Overall, unit proofing emerges as a cost-effective, scalable strategy for memory-safety verification with tangible defect-detection benefits in real-world embedded software.

Abstract

Memory safety defects pose a major threat to software reliability, enabling cyberattacks, outages, and crashes. To mitigate these risks, organizations adopt Compositional Bounded Model Checking (BMC), using unit proofs to formally verify memory safety. However, methods for creating unit proofs vary across organizations and are inconsistent within the same project, leading to errors and missed defects. In addition, unit proofing remains understudied, with no systematic development methods or empirical evaluations. This work presents the first empirical study on unit proofing for memory safety verification. We introduce a systematic method for creating unit proofs that leverages verification feedback and objective criteria. Using this approach, we develop 73 unit proofs for four embedded operating systems and evaluate their effectiveness, characteristics, cost, and generalizability. Our results show unit proofs are cost-effective, detecting 74\% of recreated defects, with an additional 9\% found with increased BMC bounds, and 19 new defects exposed. We also found that embedded software requires small unit proofs, which can be developed in 87 minutes and executed in 61 minutes on average. These findings provide practical guidance for engineers and empirical data to inform tooling design.

Paper Structure

This paper contains 45 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: A unit proof, containing models for func1 and func4, verifies the functional unit comprising func2 and func3.
  • Figure 2: Study Methodology. We first develop unit proofs in 5 steps, iterating until full coverage and no memory safety violations. Then we evaluate the five research questions.
  • Figure 3: A systematically-developed unit proof. For realism, each element is taken from a real proof. In step 1, the initial unit proof contained only the black box. Initial verification timed out due to an unconstrained function pointer. In step 2, we resolved this (adding red box). The coverage report showed gaps due to insufficient loops. In step 3, we increased the bounds of affected loops (adding green box). Finally, the error report indicated errors caused by unknown variables and functions. In step 4, we resolved these via variable preconditions (blue box) and function models (purple box).
  • Figure 4: Plot showing unit proof sizes vary by unit sizes.
  • Figure 5: Plot showing the number of variables and models in unit proofs of different sizes.
  • ...and 5 more figures