Table of Contents
Fetching ...

Reducing Labeling Effort in Architecture Technical Debt Detection through Active Learning and Explainable AI

Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi

TL;DR

This study refined an existing dataset of 116 ATD-related Jira issues from prior work, producing 57 expert-validated items used to extract representative keywords and applied SHAP and LIME to explain the outcomes of automated ATD classification.

Abstract

Self-Admitted Technical Debt (SATD) refers to technical compromises explicitly admitted by developers in natural language artifacts such as code comments, commit messages, and issue trackers. Among its types, Architecture Technical Debt (ATD) is particularly difficult to detect due to its abstract and context-dependent nature. Manual annotation of ATD is costly, time-consuming, and challenging to scale. This study focuses on reducing labeling effort in ATD detection by combining keyword-based filtering with active learning and explainable AI. We refined an existing dataset of 116 ATD-related Jira issues from prior work, producing 57 expert-validated items used to extract representative keywords. These were applied to identify over 103,000 candidate issues across ten open-source projects. To assess the reliability of this keyword-based filtering, we conducted a qualitative evaluation of a statistically representative sample of labeled issues. Building on this filtered dataset, we applied active learning with multiple query strategies to prioritize the most informative samples for annotation. Our results show that the Breaking Ties strategy consistently improves model performance, achieving the highest F1-score of 0.72 while reducing the annotation effort by 49\%. In order to enhance model transparency, we applied SHAP and LIME to explain the outcomes of automated ATD classification. Expert evaluation revealed that both LIME and SHAP provided reasonable explanations, with the usefulness of the explanations often depending on the relevance of the highlighted features. Notably, experts preferred LIME overall for its clarity and ease of use.

Reducing Labeling Effort in Architecture Technical Debt Detection through Active Learning and Explainable AI

TL;DR

This study refined an existing dataset of 116 ATD-related Jira issues from prior work, producing 57 expert-validated items used to extract representative keywords and applied SHAP and LIME to explain the outcomes of automated ATD classification.

Abstract

Self-Admitted Technical Debt (SATD) refers to technical compromises explicitly admitted by developers in natural language artifacts such as code comments, commit messages, and issue trackers. Among its types, Architecture Technical Debt (ATD) is particularly difficult to detect due to its abstract and context-dependent nature. Manual annotation of ATD is costly, time-consuming, and challenging to scale. This study focuses on reducing labeling effort in ATD detection by combining keyword-based filtering with active learning and explainable AI. We refined an existing dataset of 116 ATD-related Jira issues from prior work, producing 57 expert-validated items used to extract representative keywords. These were applied to identify over 103,000 candidate issues across ten open-source projects. To assess the reliability of this keyword-based filtering, we conducted a qualitative evaluation of a statistically representative sample of labeled issues. Building on this filtered dataset, we applied active learning with multiple query strategies to prioritize the most informative samples for annotation. Our results show that the Breaking Ties strategy consistently improves model performance, achieving the highest F1-score of 0.72 while reducing the annotation effort by 49\%. In order to enhance model transparency, we applied SHAP and LIME to explain the outcomes of automated ATD classification. Expert evaluation revealed that both LIME and SHAP provided reasonable explanations, with the usefulness of the explanations often depending on the relevance of the highlighted features. Notably, experts preferred LIME overall for its clarity and ease of use.
Paper Structure (42 sections, 1 equation, 7 figures, 10 tables)

This paper contains 42 sections, 1 equation, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Overview of the proposed approach
  • Figure 2: Example of an issue from CAMEL-19998
  • Figure 3: Example of bi-gram chunking and similarity scoring for CAMEL-19998. The phrase "cyclic dependency" is identified as the most semantically similar chunk to ATD-related keywords, based on cosine similarity scores
  • Figure 4: Query strategies used
  • Figure 5: LIME plot explanation for a Jira issue (CAMEL-19998) classified as ATD
  • ...and 2 more figures