Toward a Better Understanding of Probabilistic Delta Debugging

Mengxiao Zhang; Zhenyang Xu; Yongqiang Tian; Xinru Cheng; Chengnian Sun

Toward a Better Understanding of Probabilistic Delta Debugging

Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Xinru Cheng, Chengnian Sun

TL;DR

This work analyzes ProbDD, the state-of-the-art probabilistic delta debugging technique, to uncover the mechanisms behind its performance and to simplify its design. The authors derive theoretical simplifications that link element probability $p_r$ and round-based subset size $s_r$, showing that probabilities effectively act as monotonically increasing counters and that $p_r$ evolves roughly as $p_r \approx p_0/(1-e^{-1})^r$, while $s_r$ can be precomputed as $s_r \approx -1/\ln(1-p_r)$. Based on these insights, they propose Counter-Based Delta Debugging (CDD), which removes probability calculations in favor of round-driven sizing, yet retains the same practical performance as ProbDD across 76 benchmarks spanning test input minimization and software debloating. Empirical results reveal that ProbDD and CDD achieve substantially lower time and query counts than ddmin, with randomness contributing little to performance, and that the main efficiency gains arise from skipping inefficient complement and revisit queries. The paper concludes with a discussion of trade-offs, limitations such as incomplete 1-minimality, and public release of artifacts and integration with the Perses project to facilitate future research and applications.

Abstract

Given a list L of elements and a property that L exhibits, ddmin is a well-known test input minimization algorithm designed to automatically eliminate irrelevant elements from L. This algorithm is extensively adopted in test input minimization and software debloating. Recently, ProbDD, an advanced variant of ddmin, has been proposed and achieved state-of-the-art performance. Employing Bayesian optimization, ProbDD predicts the likelihood of each element in L being essential, and statistically decides which elements and how many should be removed each time. Despite its impressive results, the theoretical probabilistic model of ProbDD is complex, and the specific factors driving its superior performance have not been investigated. In this paper, we conduct the first in-depth theoretical analysis of ProbDD, clarifying trends in probability and subset size changes while simplifying the probability model. Complementing this analysis, we perform empirical experiments, including success rate analysis, ablation studies, and analysis on trade-offs and limitations, to better understand and demystify this state-of-the-art algorithm. Our success rate analysis shows how ProbDD addresses bottlenecks of ddmin by skipping inefficient queries that attempt to delete complements of subsets and previously tried subsets. The ablation study reveals that randomness in ProbDD has no significant impact on efficiency. Based on these findings, we propose CDD, a simplified version of ProbDD, reducing complexity in both theory and implementation. Besides, the performance of CDD validates our key findings. Comprehensive evaluations across 76 benchmarks in test input minimization and software debloating show that CDD can achieve the same performance as ProbDD despite its simplification. These insights provide valuable guidance for future research and applications of test input minimization algorithms.

Toward a Better Understanding of Probabilistic Delta Debugging

TL;DR

and round-based subset size

, showing that probabilities effectively act as monotonically increasing counters and that

evolves roughly as

, while

can be precomputed as

. Based on these insights, they propose Counter-Based Delta Debugging (CDD), which removes probability calculations in favor of round-driven sizing, yet retains the same practical performance as ProbDD across 76 benchmarks spanning test input minimization and software debloating. Empirical results reveal that ProbDD and CDD achieve substantially lower time and query counts than ddmin, with randomness contributing little to performance, and that the main efficiency gains arise from skipping inefficient complement and revisit queries. The paper concludes with a discussion of trade-offs, limitations such as incomplete 1-minimality, and public release of artifacts and integration with the Perses project to facilitate future research and applications.

Abstract

Paper Structure (24 sections, 3 theorems, 9 equations, 2 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 3 theorems, 9 equations, 2 figures, 5 tables, 2 algorithms.

Introduction
Preliminaries
The ddmin Algorithm
Probabilistic Delta Debugging (ProbDD)
Delving Deeper into Probability and Size
On the Probability in ProbDD
Assumption for Theoretical Analysis
Probability vs. Subset Size Correlation
Trend of Probability Changes
On the Size of Subsets in ProbDD
Empirical Experiments
Benchmarks
Evaluation Metrics
The Wrapping Frameworks
Reproduction Study of ProbDD
...and 9 more sections

Key Result

Lemma 3.1

If the number of elements in $L$ is always divisible by the subset size, then after each round, the probabilities of all elements will always remain the same.

Figures (2)

Figure 1: A running example in Python. \ref{['subfig:running_example:original']} shows the original program, represented as a list of 8 elements (${l}_{1}$, ${l}_{2}$, $\cdots$, ${l}_{8}$), in which ${l}_{8}$ (i.e., crash(c)) triggers the crash. \ref{['subfig:running_example:ddmin']} and \ref{['subfig:running_example:probdd']} show the minimized results by ddmin and ProbDD, with removed elements masked in gray. Both minimized programs still trigger the crash. Note that ProbDD cannot consistently guarantee the result in \ref{['subfig:running_example:probdd']} and might produce larger results, due to its inherent randomness.
Figure 2: Visualization of queries within ddmin, ProbDD and CDD. In ddmin, three types of queries are displayed via stacked bars, the height of which denotes the query number. Within each bar, the number of successful queries, total queries and the corresponding success rate are annotated.

Theorems & Definitions (6)

Lemma 3.1
proof
Lemma 3.2
proof
Lemma 3.3
proof

Toward a Better Understanding of Probabilistic Delta Debugging

TL;DR

Abstract

Toward a Better Understanding of Probabilistic Delta Debugging

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)