How Execution Features Relate to Failures: An Empirical Study and Diagnosis Approach

Marius Smytzek; Martin Eberlein; Lars Grunske; Andreas Zeller

How Execution Features Relate to Failures: An Empirical Study and Diagnosis Approach

Marius Smytzek, Martin Eberlein, Lars Grunske, Andreas Zeller

TL;DR

Fault localization has often relied on line coverage; this paper argues that broader execution features can improve diagnostic power. It empirically analyzes 17 execution features across 310 bugs from 20 projects and introduces EFDD, which learns features from executions and trains a decision tree to generate interpretable diagnoses of failures. Data-flow features like def-use pairs and scalar pairs show strong correlations with failures, with correlation quantified by Spearman's $\rho$, and multi-feature fusion improves localization. Evaluation reports high predictive accuracy (approximately 89% overall) and practical runtimes, demonstrating that interpretable, feature-driven diagnoses can significantly aid developers in debugging and enabling automated repair workflows.

Abstract

Fault localization is a fundamental aspect of debugging, aiming to identify code regions likely responsible for failures. Traditional techniques primarily correlate statement execution with failures, yet program behavior is influenced by diverse execution features-such as variable values, branch conditions, and definition-use pairs-that can provide richer diagnostic insights. In an empirical study of 310 bugs across 20 projects, we analyzed 17 execution features and assessed their correlation with failure outcomes. Our findings suggest that fault localization benefits from a broader range of execution features: (1) Scalar pairs exhibit the strongest correlation with failures; (2) Beyond line executions, def-use pairs and functions executed are key indicators for fault localization; and (3) Combining multiple features enhances effectiveness compared to relying solely on individual features. Building on these insights, we introduce a debugging approach to diagnose failure circumstances. The approach extracts fine-grained execution features and trains a decision tree to differentiate passing and failing runs. From this model, we derive a diagnosis that pinpoints faulty locations and explains the underlying causes of the failure. Our evaluation demonstrates that the generated diagnoses achieve high predictive accuracy, reinforcing their reliability. These interpretable diagnoses empower developers to efficiently debug software by providing deeper insights into failure causes.

How Execution Features Relate to Failures: An Empirical Study and Diagnosis Approach

TL;DR

Abstract

How Execution Features Relate to Failures: An Empirical Study and Diagnosis Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)