Table of Contents
Fetching ...

Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing

Yile Gu, Ian Neal, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci

TL;DR

This paper tackles crash-consistency testing by addressing crash-state space explosion through representative testing, which leverages the observation that crash states are often correlated across similar update behaviors. It introduces Path-finder, a tool that traces applications, builds a Persistence Graph, partitions it into update-behavior subgraphs, groups them by representative relations, and uses DPOR-backed model checking to validate crash states. Across POSIX-based and MMIO-based, production-ready systems, Path-finder finds 18 bugs (7 new), delivering substantial improvements in scalability and coverage over state-of-the-art techniques. The approach enables scalable, accurate crash-consistency testing with practical impact for large software stacks, and its ideas could generalize to other domains with update-behavior semantics.

Abstract

Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. The crash-state space grows exponentially as the number of operations in the program increases, necessitating techniques for pruning the search space. However, state-of-the-art crash-state space pruning is far from ideal. Some techniques look for known buggy patterns or bound the exploration for efficiency, but they sacrifice coverage and may miss bugs lodged deep within applications. Other techniques eliminate redundancy in the search space by skipping identical crash states, but they still fail to scale to larger applications. In this work, we propose representative testing: a new crash-state space reduction strategy that achieves high scalability and high coverage. Our key observation is that the consistency of crash states is often correlated, even if those crash states are not identical. We build Pathfinder, a crash-consistency testing tool that implements an update behaviors-based heuristic to approximate a small set of representative crash states. We evaluate Pathfinder on POSIX-based and MMIO-based applications, where it finds 18 (7 new) bugs across 8 production-ready systems. Pathfinder scales more effectively to large applications than prior works and finds 4x more bugs in POSIX-based applications and 8x more bugs in MMIO-based applications compared to state-of-the-art systems.

Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing

TL;DR

This paper tackles crash-consistency testing by addressing crash-state space explosion through representative testing, which leverages the observation that crash states are often correlated across similar update behaviors. It introduces Path-finder, a tool that traces applications, builds a Persistence Graph, partitions it into update-behavior subgraphs, groups them by representative relations, and uses DPOR-backed model checking to validate crash states. Across POSIX-based and MMIO-based, production-ready systems, Path-finder finds 18 bugs (7 new), delivering substantial improvements in scalability and coverage over state-of-the-art techniques. The approach enables scalable, accurate crash-consistency testing with practical impact for large software stacks, and its ideas could generalize to other domains with update-behavior semantics.

Abstract

Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. The crash-state space grows exponentially as the number of operations in the program increases, necessitating techniques for pruning the search space. However, state-of-the-art crash-state space pruning is far from ideal. Some techniques look for known buggy patterns or bound the exploration for efficiency, but they sacrifice coverage and may miss bugs lodged deep within applications. Other techniques eliminate redundancy in the search space by skipping identical crash states, but they still fail to scale to larger applications. In this work, we propose representative testing: a new crash-state space reduction strategy that achieves high scalability and high coverage. Our key observation is that the consistency of crash states is often correlated, even if those crash states are not identical. We build Pathfinder, a crash-consistency testing tool that implements an update behaviors-based heuristic to approximate a small set of representative crash states. We evaluate Pathfinder on POSIX-based and MMIO-based applications, where it finds 18 (7 new) bugs across 8 production-ready systems. Pathfinder scales more effectively to large applications than prior works and finds 4x more bugs in POSIX-based applications and 8x more bugs in MMIO-based applications compared to state-of-the-art systems.

Paper Structure

This paper contains 48 sections, 6 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Correct execution trace and buggy execution trace of the SetCurrentFile bug, with example crash schedules and resulting crash states from each execution.
  • Figure 2: Executions from insert and insert_ordered functions respectively, with example crash schedules and resulting crash states from each execution.
  • Figure 3: Functional overview of Path-finder.
  • Figure 4: Persistence graph creation for an example POSIX-based application (Step A & B). Fn1 is the function currently under execution, which invokes functions Fn2 to Fn5. Each function may persist data to the disk or alter file status on the disk.
  • Figure 5: Persistence graph creation for an example MMIO-based application (Step A & B). FnX is the function currently under execution, which calls functions FnY and FnZ to persist data in data structure M and N to disk.
  • ...and 5 more figures

Theorems & Definitions (5)

  • definition 1: Update Behavior
  • definition 2: Persistence Graph
  • definition 3: Persistence Graph Node Equivalence
  • definition 4: Persistence Graph Edge Equivalence
  • definition 5: Represents Relation