Table of Contents
Fetching ...

Improving Spectrum-Based Localization of Multiple Faults by Iterative Test Suite Reduction

Dylan Callaghan, Bernd Fischer

TL;DR

This paper tackles the challenge of spectrum-based fault localization deteriorating in multi-fault programs by introducing FLITSR, a purely SBFL-based, iterative test-suite reduction technique that constructs a fault-covering basis and re-ranks elements. FLITSR* extends this approach to yield multiple bases across rounds, mitigating fault masking and dominator effects. Across two large datasets—synthetic multi-fault variants and Defects4J real faults—FLITSR and FLITSR* yield substantial reductions in wasted effort and improved precision/recall, with FLITSR outperforming the state-of-the-art GRACE on method-level real faults. The work demonstrates that FLITSR generalizes across SBFL metrics, offering a practical, scalable MBA solution that enhances multi-fault localization without additional modeling or training.

Abstract

Spectrum-based fault localization (SBFL) works well for single-fault programs but its accuracy decays for increasing fault numbers. We present FLITSR (Fault Localization by Iterative Test Suite Reduction), a novel SBFL extension that improves the localization of a given base metric specifically in the presence of multiple faults. FLITSR iteratively selects reduced versions of the test suite that better localize the individual faults in the system. This allows it to identify and re-rank faults ranked too low by the base metric because they were masked by other program elements. We evaluated FLITSR over method-level spectra from an existing large synthetic dataset comprising 75000 variants of 15 open-source projects with up to 32 injected faults, as well as method-level and statement-level spectra from a new dataset with 326 true multi-fault versions from the Defects4J benchmark set containing up to 14 real faults. For all three spectrum types we consistently see substantial reductions of the average wasted efforts at different fault levels, of 30%-90% over the best base metric, and generally similarly large increases in precision and recall, albeit with larger variance across the underlying projects. For the method-level real faults, FLITSR also substantially outperforms GRACE, a state-of-the-art learning-based fault localizer.

Improving Spectrum-Based Localization of Multiple Faults by Iterative Test Suite Reduction

TL;DR

This paper tackles the challenge of spectrum-based fault localization deteriorating in multi-fault programs by introducing FLITSR, a purely SBFL-based, iterative test-suite reduction technique that constructs a fault-covering basis and re-ranks elements. FLITSR* extends this approach to yield multiple bases across rounds, mitigating fault masking and dominator effects. Across two large datasets—synthetic multi-fault variants and Defects4J real faults—FLITSR and FLITSR* yield substantial reductions in wasted effort and improved precision/recall, with FLITSR outperforming the state-of-the-art GRACE on method-level real faults. The work demonstrates that FLITSR generalizes across SBFL metrics, offering a practical, scalable MBA solution that enhances multi-fault localization without additional modeling or training.

Abstract

Spectrum-based fault localization (SBFL) works well for single-fault programs but its accuracy decays for increasing fault numbers. We present FLITSR (Fault Localization by Iterative Test Suite Reduction), a novel SBFL extension that improves the localization of a given base metric specifically in the presence of multiple faults. FLITSR iteratively selects reduced versions of the test suite that better localize the individual faults in the system. This allows it to identify and re-rank faults ranked too low by the base metric because they were masked by other program elements. We evaluated FLITSR over method-level spectra from an existing large synthetic dataset comprising 75000 variants of 15 open-source projects with up to 32 injected faults, as well as method-level and statement-level spectra from a new dataset with 326 true multi-fault versions from the Defects4J benchmark set containing up to 14 real faults. For all three spectrum types we consistently see substantial reductions of the average wasted efforts at different fault levels, of 30%-90% over the best base metric, and generally similarly large increases in precision and recall, albeit with larger variance across the underlying projects. For the method-level real faults, FLITSR also substantially outperforms GRACE, a state-of-the-art learning-based fault localizer.
Paper Structure (21 sections, 6 figures, 4 tables, 2 algorithms)

This paper contains 21 sections, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Test suite, spectrum, and suspiciousness scores for running example. ✓ and ✗ denote execution in passing and failing (also highlighted in red) tests. Faulty statements are boldfaced in the top row, scores of the most suspicious statements are highlighted in yellow in the bottom rows.
  • Figure 2: Recursion chain for FLITSR with Ochiai as base metric, over the test suite in \ref{['tab:motv1']}.
  • Figure 3: Ochiai scores for each FLITSR iteration in running example; the highest scores in each iteration are highlighted, and elements in the final basis boldfaced. Final ranking shown at bottom.
  • Figure 4: Spectrum (top block) and suspiciousness scores for running example with extended test suite. Scores for base metrics shown in second block, Ochiai scores for FLITSR and FLITSR* rounds in third and fourth blocks. Highest scores in each iteration are highlighted, and elements in any computed basis are boldfaced. Final ranking shown at bottom.
  • Figure 5: Percentage of bugs found over proportion of methods inspected in rank order, x-axis shown in log-scale. (Top) averaged over synthetic fault dataset parallel:steimann. (Bot) averaged over Defects4J multi-fault dataset.
  • ...and 1 more figures