Improving classifier-based effort-aware software defect prediction by reducing ranking errors
Yuchen Guo, Martin Shepperd, Ning Li
TL;DR
This work reframes classifier-based effort-aware defect prediction as a ranking problem and identifies ranking errors arising from near-zero defective probabilities, termed Minor Chaos. It introduces EA-Z, a ranking score with a lower bound $\zeta$ that maps $p(x)$ to $p'(x)$ via $p'(x) = p(x) \cdot (1-\zeta) + \zeta$ and computes $EA_Z(x) = \frac{p'(x)}{LOC}$, with $\zeta = 0.05$ guiding the balance between approximation to the defect/LOC ratio and robustness to Minor Chaos. The authors evaluate EA-Z against four existing strategies across 72 real-world datasets using 16 classifiers (including imbalanced ensembles) in 61 cross-project/cross-version experiments, finding that EA-Z delivers the best Recall@20% and $P_{opt}$ on average, particularly with imbalanced ensembles like UBag-svm and UBst-rf, while maintaining acceptable IFA. These results demonstrate that mitigating ranking errors can meaningfully improve the cost-effectiveness of defect prediction and offer practical guidance for deploying EA-Z in software quality assurance workflows.
Abstract
Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.
