Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study
Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan, Zhen Ming, Jiang
TL;DR
This study addresses defect prediction by introducing Knowledge Units (KUs)—the language-building-block–level capabilities derived from Java certification topics—as features for predicting post-release defects. The authors develop KUM, a KU-based model, and compare it against baselines built from traditional metrics, showing KUM delivers strong predictive power (median AUC ≈ 0.82) and often outperforms single-metric baselines, while the TM baseline remains strongest overall. Combining KU features with traditional metrics (KUM+TM) yields the best results (median AUC ≈ 0.89), with ADEV (active developers) emerging as the top predictor and KU features like Method & Encapsulation contributing significantly. A cost-effective variant (COST_EFF) uses only 10 features yet remains competitive (median AUC ≈ 0.87), demonstrating practical benefits in reducing feature engineering costs. The results underscore KU’s complementary value to traditional metrics, offer actionable interpretability via SHAP, and point to future work in scalable KU elicitation and cross-language applications.
Abstract
Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, process, and code ownership metrics, which, while effective, do not capture language-specific traits that may influence defect proneness. To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We conduct an empirical study leveraging 28 KUs that are derived from Java certification exams and compare their effectiveness against traditional metrics in predicting post-release defects across 8 well-maintained Java software systems. Our results show that KUs provide significant predictive power, achieving a median AUC of 0.82, outperforming individual group of traditional metric-based models. Among KU features, Method & Encapsulation, Inheritance, and Exception Handling emerge as the most influential predictors. Furthermore, combining KUs with traditional metrics enhances prediction performance, yielding a median AUC of 0.89. We also introduce a cost-effective model using only 10 features, which maintains strong predictive performance while reducing feature engineering costs. Our findings demonstrate the value of KUs in predicting post-release defects, offering a complementary perspective to traditional metrics. This study can be helpful to researchers who wish to analyze software systems from a perspective that is complementary to that of traditional metrics.
