Mining Action Rules for Defect Reduction Planning

Khouloud Oueslati; Gabriel Laberge; Maxime Lamothe; Foutse Khomh

Mining Action Rules for Defect Reduction Planning

Khouloud Oueslati, Gabriel Laberge, Maxime Lamothe, Foutse Khomh

TL;DR

CounterACT tackles defect reduction planning by replacing black-box predictions with counterfactual, action-rule based plans mined from historical data. The approach combines actionable analysis, action-rule mining, and plan selection to produce transparent, high-overlap, and high-precision defect-reduction plans at both release and commit levels, outperforming baselines like TimeLIME and XTREE. Empirical results show strong overlap with developers' past changes, high precision/recall trade-offs, and substantial uplift in expected defect reduction, with additional validation via LLM-assisted code edits that improve fix rates. The work demonstrates that counterfactual reasoning, grounded in transparent rule mining, can yield practically actionable guidance for software maintenance, while enabling effective augmentation with LLMs for automated edits; all data and code are publicly available for replication.

Abstract

Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black box machine learning model and "explaining" its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.

Mining Action Rules for Defect Reduction Planning

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Background and related work
Software Defect Prediction
Machine Learning Techniques
Post-hoc Explanations
Defect Reduction Planning
Planners
Plan Quality Assessement
Action rule mining
Methodology
Actionable Analysis
Mining Action Rules
Plan Selection:
Experimental details
Release Level
...and 8 more sections

Figures (14)

Figure 1: How action rules are mined. First classification rules with sufficient support and confidence are mined using Apriori. This leads to the two rules in the red and blue regions. Then, rules with matching antecedents but different consequents are combined leading to the action rule with support(6%) and confidence(52%) $r=[(\text{avg\_cc}>1.4) \land (\text{cbo}>14 \rightarrow \text{cbo}\leq14)] \Rightarrow [\text{bug}\rightarrow \text{no-bug}]$.
Figure 2: CounterACT: Overview of the approach. Note that for evaluating other benchmark planners, the grey area be replaced by the corresponding planner.
Figure 3: CounterACT: an illustrative example
Figure 4: Median overlap scores between different plan selection methods. The reported values on the histograms are the standard deviations
Figure 5: Results of top-M experiments. We chose the value of $M=10$, since it shows the most stable overlap and F1 score
...and 9 more figures

Mining Action Rules for Defect Reduction Planning

TL;DR

Abstract

Mining Action Rules for Defect Reduction Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)