Simple Fault Localization using Execution Traces
Julian Aron Prenner, Romain Robbes
TL;DR
This work addresses the limited performance of traditional spectrum-based fault localization by leveraging execution traces to derive count spectra, basic control-flow, and lexical features, then contextualizing them with a sliding window. A gradient-boosted model (LightGBM) is trained on contextualized line-level features to predict buggy lines, and evaluated on RunBugRun and QuixBugs. The approach consistently outperforms standard SBFL metrics, notably with a window size of three and across multiple feature groups, while remaining GPU-free and easily integrable. The findings suggest that simple, trace-derived features can meaningfully enhance FL without resorting to heavy static analysis or deep learning. The work also highlights data quality and quantity as critical factors for further improvements and generalization to larger, real-world codebases.
Abstract
Traditional spectrum-based fault localization (SBFL) exploits differences in a program's coverage spectrum when run on passing and failing test cases. However, such runs can provide a wealth of additional information beyond mere coverage. Working with thousands of execution traces of short programs submitted to competitive programming contests and leveraging machine learning and additional runtime, control-flow and lexical features, we present simple ways to improve SBFL. We also propose a simple trick to integrate context information. Our approach outperforms SBFL formulae such as Ochiai on our evaluation set as well as QuixBugs and requires neither a GPU nor any form of advanced program analysis. Existing SBFL solutions could possibly be improved with reasonable effort by adopting some of the proposed ideas.
