An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Aaditya Bhatia; Foutse Khomh; Bram Adams; Ahmed E Hassan

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Aaditya Bhatia, Foutse Khomh, Bram Adams, Ahmed E Hassan

TL;DR

The paper empirically characterizes self-admitted technical debt (SATD) in ML software by analyzing 318 ML and 318 non-ML GitHub projects. It extends existing SATD taxonomy with ML-specific debts (Configuration Debt, Inadequate Testing Debt) and maps SATD to five ML pipeline stages, revealing Model Building and Data Preprocessing as hotspots. Survival analysis shows SATD is introduced far earlier and removed much faster in ML projects, yet persists in certain small, low-complexity files when changes are large. The study provides actionable insights for ML stakeholders, including governance of configurations, testing practices, and targeted debt remediation, supported by replication data and SHAP-based explanations. Overall, the work highlights the distinct debt dynamics in ML software and offers a foundation for better TD management in the ML era.

Abstract

The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algorithms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

TL;DR

Abstract

Paper Structure (49 sections, 1 equation, 15 figures, 7 tables)

This paper contains 49 sections, 1 equation, 15 figures, 7 tables.

Introduction
Related work
Studies on Software Engineering for Machine Learning
Empirical Studies on SATD in Traditional (Non-ML) Software
Studies Presenting SATD Detection Tools
Research Questions
RQ1: What is the prevalence of SATD in ML-based systems?
RQ2: What are the different types of SATD in ML-based systems?
RQ3: Which stages of the ML pipeline are more prone to SATD?
RQ4: How long does SATD survive in ML-based systems?
RQ5: What are the characteristics of long-lasting SATDs in ML-based systems?
Case Study Setup
Selection of ML Projects
Choice of ML Domains
Keyword-Based Searching to get Candidate Repositories
...and 34 more sections

Figures (15)

Figure 1: Example of SATD in a Natural Language Processing application, Mead-ML.
Figure 2: Data collection and processing steps.
Figure 3: Distribution of number of Lines of code for different project domains.
Figure 4: Project history lifetimes, commits, and churn per commit for ML and non-ML projects of our dataset.
Figure 5: Percentage of SATD in ML and non-ML projects. The comparison has been made by collecting 424,248 bootstrapped samples from ML and non-ML comments and repeating the process 1,000 times (to ensure robustness).
...and 10 more figures

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

TL;DR

Abstract

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Authors

TL;DR

Abstract

Table of Contents

Figures (15)