Table of Contents
Fetching ...

Exploring the Advances in Using Machine Learning to Identify Technical Debt and Self-Admitted Technical Debt

Eric L. Melin, Nasir U. Eisty

TL;DR

This paper addresses the challenge of automatically identifying technical debt (TD) and self-admitted technical debt (SATD) in software projects using machine learning. It conducts a structured literature review up to 2024 across five major databases and applies a Snowballing approach to assemble a comprehensive set of studies. The findings indicate no universal best method for TD, with tree-based methods like RF and XGBoost performing well in several studies, while SATD detection consistently benefits from transformer-based models such as BERT-family, often achieving the highest F1 scores. The work provides practical guidance for researchers to prioritize BERT-based SATD detection and to consider data-source characteristics when selecting ML techniques, while acknowledging methodological threats to validity and the need for continued cross-domain evaluation.

Abstract

In software engineering, technical debt, signifying the compromise between short-term expediency and long-term maintainability, is being addressed by researchers through various machine learning approaches. This study seeks to provide a reflection on the current research landscape employing machine learning methods for detecting technical debt and self-admitted technical debt in software projects and compare the machine learning research about technical debt and self-admitted technical debt. We performed a literature review of studies published up to 2024 that discuss technical debt and self-admitted technical debt identification using machine learning. Our findings reveal the utilization of a diverse range of machine learning techniques, with BERT models proving significantly more effective than others. This study demonstrates that although the performance of techniques has improved over the years, no universally adopted approach reigns supreme. The results suggest prioritizing BERT techniques over others in future works.

Exploring the Advances in Using Machine Learning to Identify Technical Debt and Self-Admitted Technical Debt

TL;DR

This paper addresses the challenge of automatically identifying technical debt (TD) and self-admitted technical debt (SATD) in software projects using machine learning. It conducts a structured literature review up to 2024 across five major databases and applies a Snowballing approach to assemble a comprehensive set of studies. The findings indicate no universal best method for TD, with tree-based methods like RF and XGBoost performing well in several studies, while SATD detection consistently benefits from transformer-based models such as BERT-family, often achieving the highest F1 scores. The work provides practical guidance for researchers to prioritize BERT-based SATD detection and to consider data-source characteristics when selecting ML techniques, while acknowledging methodological threats to validity and the need for continued cross-domain evaluation.

Abstract

In software engineering, technical debt, signifying the compromise between short-term expediency and long-term maintainability, is being addressed by researchers through various machine learning approaches. This study seeks to provide a reflection on the current research landscape employing machine learning methods for detecting technical debt and self-admitted technical debt in software projects and compare the machine learning research about technical debt and self-admitted technical debt. We performed a literature review of studies published up to 2024 that discuss technical debt and self-admitted technical debt identification using machine learning. Our findings reveal the utilization of a diverse range of machine learning techniques, with BERT models proving significantly more effective than others. This study demonstrates that although the performance of techniques has improved over the years, no universally adopted approach reigns supreme. The results suggest prioritizing BERT techniques over others in future works.
Paper Structure (22 sections, 1 figure, 2 tables)