Table of Contents
Fetching ...

Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study

Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan, Zhen Ming, Jiang

TL;DR

This study addresses defect prediction by introducing Knowledge Units (KUs)—the language-building-block–level capabilities derived from Java certification topics—as features for predicting post-release defects. The authors develop KUM, a KU-based model, and compare it against baselines built from traditional metrics, showing KUM delivers strong predictive power (median AUC ≈ 0.82) and often outperforms single-metric baselines, while the TM baseline remains strongest overall. Combining KU features with traditional metrics (KUM+TM) yields the best results (median AUC ≈ 0.89), with ADEV (active developers) emerging as the top predictor and KU features like Method & Encapsulation contributing significantly. A cost-effective variant (COST_EFF) uses only 10 features yet remains competitive (median AUC ≈ 0.87), demonstrating practical benefits in reducing feature engineering costs. The results underscore KU’s complementary value to traditional metrics, offer actionable interpretability via SHAP, and point to future work in scalable KU elicitation and cross-language applications.

Abstract

Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, process, and code ownership metrics, which, while effective, do not capture language-specific traits that may influence defect proneness. To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We conduct an empirical study leveraging 28 KUs that are derived from Java certification exams and compare their effectiveness against traditional metrics in predicting post-release defects across 8 well-maintained Java software systems. Our results show that KUs provide significant predictive power, achieving a median AUC of 0.82, outperforming individual group of traditional metric-based models. Among KU features, Method & Encapsulation, Inheritance, and Exception Handling emerge as the most influential predictors. Furthermore, combining KUs with traditional metrics enhances prediction performance, yielding a median AUC of 0.89. We also introduce a cost-effective model using only 10 features, which maintains strong predictive performance while reducing feature engineering costs. Our findings demonstrate the value of KUs in predicting post-release defects, offering a complementary perspective to traditional metrics. This study can be helpful to researchers who wish to analyze software systems from a perspective that is complementary to that of traditional metrics.

Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study

TL;DR

This study addresses defect prediction by introducing Knowledge Units (KUs)—the language-building-block–level capabilities derived from Java certification topics—as features for predicting post-release defects. The authors develop KUM, a KU-based model, and compare it against baselines built from traditional metrics, showing KUM delivers strong predictive power (median AUC ≈ 0.82) and often outperforms single-metric baselines, while the TM baseline remains strongest overall. Combining KU features with traditional metrics (KUM+TM) yields the best results (median AUC ≈ 0.89), with ADEV (active developers) emerging as the top predictor and KU features like Method & Encapsulation contributing significantly. A cost-effective variant (COST_EFF) uses only 10 features yet remains competitive (median AUC ≈ 0.87), demonstrating practical benefits in reducing feature engineering costs. The results underscore KU’s complementary value to traditional metrics, offer actionable interpretability via SHAP, and point to future work in scalable KU elicitation and cross-language applications.

Abstract

Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, process, and code ownership metrics, which, while effective, do not capture language-specific traits that may influence defect proneness. To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We conduct an empirical study leveraging 28 KUs that are derived from Java certification exams and compare their effectiveness against traditional metrics in predicting post-release defects across 8 well-maintained Java software systems. Our results show that KUs provide significant predictive power, achieving a median AUC of 0.82, outperforming individual group of traditional metric-based models. Among KU features, Method & Encapsulation, Inheritance, and Exception Handling emerge as the most influential predictors. Furthermore, combining KUs with traditional metrics enhances prediction performance, yielding a median AUC of 0.89. We also introduce a cost-effective model using only 10 features, which maintains strong predictive performance while reducing feature engineering costs. Our findings demonstrate the value of KUs in predicting post-release defects, offering a complementary perspective to traditional metrics. This study can be helpful to researchers who wish to analyze software systems from a perspective that is complementary to that of traditional metrics.

Paper Structure

This paper contains 20 sections, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Our metamodel for knowledge units (KUs).
  • Figure 2: An overview of our data collection process.
  • Figure 3: An overview of our approach for building and evaluating KUM.
  • Figure 4: The distribution of AUC of KUM and other studied models. The models are grouped based on their performance rankings determined by the Scott-Knott ESD (SK-ESD) method, where a lower SK-ESD rank indicates a better-performing model.
  • Figure 5: The rank distribution of features of KUM for defect classification. The number inside the square brackets indicates the final rank of the feature after applying Scott-Knott ESD for the second time.
  • ...and 6 more figures