Improving classifier-based effort-aware software defect prediction by reducing ranking errors

Yuchen Guo; Martin Shepperd; Ning Li

Improving classifier-based effort-aware software defect prediction by reducing ranking errors

Yuchen Guo, Martin Shepperd, Ning Li

TL;DR

This work reframes classifier-based effort-aware defect prediction as a ranking problem and identifies ranking errors arising from near-zero defective probabilities, termed Minor Chaos. It introduces EA-Z, a ranking score with a lower bound $\zeta$ that maps $p(x)$ to $p'(x)$ via $p'(x) = p(x) \cdot (1-\zeta) + \zeta$ and computes $EA_Z(x) = \frac{p'(x)}{LOC}$, with $\zeta = 0.05$ guiding the balance between approximation to the defect/LOC ratio and robustness to Minor Chaos. The authors evaluate EA-Z against four existing strategies across 72 real-world datasets using 16 classifiers (including imbalanced ensembles) in 61 cross-project/cross-version experiments, finding that EA-Z delivers the best Recall@20% and $P_{opt}$ on average, particularly with imbalanced ensembles like UBag-svm and UBst-rf, while maintaining acceptable IFA. These results demonstrate that mitigating ranking errors can meaningfully improve the cost-effectiveness of defect prediction and offer practical guidance for deploying EA-Z in software quality assurance workflows.

Abstract

Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.

Improving classifier-based effort-aware software defect prediction by reducing ranking errors

TL;DR

that maps

via

and computes

, with

guiding the balance between approximation to the defect/LOC ratio and robustness to Minor Chaos. The authors evaluate EA-Z against four existing strategies across 72 real-world datasets using 16 classifiers (including imbalanced ensembles) in 61 cross-project/cross-version experiments, finding that EA-Z delivers the best Recall@20% and

on average, particularly with imbalanced ensembles like UBag-svm and UBst-rf, while maintaining acceptable IFA. These results demonstrate that mitigating ranking errors can meaningfully improve the cost-effectiveness of defect prediction and offer practical guidance for deploying EA-Z in software quality assurance workflows.

Abstract

Paper Structure (15 sections, 5 equations, 12 figures, 7 tables)

This paper contains 15 sections, 5 equations, 12 figures, 7 tables.

Introduction
Related Work
Effort-aware Ranking Prediction
The Problem of Minor Chaos
Ranking Score Calculation Strategy EA-Z
Evaluation and Experimental Method
Evaluation
Datasets and Data Preparation
Learners and Classifier Settings
Experimental results
Ranking Strategy Comparison
Learner Comparison and Baseline
Additional analysis for Zeta
Threats to Validity
Conclusions

Figures (12)

Figure 1: Cost-effectiveness Curve
Figure 2: Example of Minor Chaos
Figure 3: Minor Chaos Reduce Performance
Figure 4: Recall@20% performance of 5 ranking strategies
Figure 5: Popt performance of 5 ranking strategies
...and 7 more figures

Improving classifier-based effort-aware software defect prediction by reducing ranking errors

TL;DR

Abstract

Improving classifier-based effort-aware software defect prediction by reducing ranking errors

Authors

TL;DR

Abstract

Table of Contents

Figures (12)