BayesFLo: Bayesian fault localization of complex software systems
Yi Ji, Simon Mak, Ryan Lekivetz, Joseph Morgan
TL;DR
BayesFLo advances fault localization for complex software by introducing a Bayesian model over all input-combination root causes that integrates domain knowledge via a product-form prior encoding combination hierarchy and heredity. It partitions test results into tested-and-passed, failed, and untested combinations and derives a tractable posterior for TF combinations using an alternate formulation linked to minimal set covers, solved via ILP and inclusion–exclusion. The approach yields probabilistic risk assessments, disentangles tied suspicious combinations, and reduces debugging costs, as demonstrated in TCAS and JMP Easy DOE case studies where true root causes are identified with high posterior probability. This probabilistic, knowledge-guided framework represents a practical enhancement over deterministic covering-array analyses, enabling more confident and cost-effective software fault diagnosis.
Abstract
Software testing is essential for the reliable development of complex software systems. A key step in software testing is fault localization, which uses test data to pinpoint failure-inducing combinations for further diagnosis. Existing fault localization methods have two key limitations: they (i) do not incorporate domain and/or structural knowledge from test engineers, and (ii) do not provide a probabilistic assessment of risk for potential root causes. Such methods can thus fail to confidently whittle down the combinatorial number of potential root causes in complex systems, resulting in prohibitively high testing costs. To address this, we propose a novel Bayesian fault localization framework called BayesFLo, which leverages a flexible Bayesian model for identifying potential root causes with probabilistic uncertainty. Using a carefully-specified prior on root cause probabilities, BayesFLo permits the integration of domain and structural knowledge via the principles of combination hierarchy and heredity, which capture the expected structure of failure-inducing combinations. We then develop new algorithms for efficient computation of posterior root cause probabilities, leveraging recent tools from integer programming and graph representations. Finally, we demonstrate the effectiveness of BayesFLo over existing methods in two fault localization case studies on the Traffic Alert and Collision Avoidance System and the JMP Easy DOE platform.
