Quantifying and Characterizing Clones of Self-Admitted Technical Debt in Build Systems
Tao Xiao, Zhili Zeng, Dong Wang, Hideaki Hata, Shane McIntosh, Kenichi Matsumoto
TL;DR
This paper investigates the propagation of Self-Admitted Technical Debt (SATD) through build systems by examining 50,608 SATD comments across Autotools, CMake, Maven, and Ant. It combines keyword-based SATD extraction, sentence-embedding similarity, and DBSCAN clustering to identify SATD clones, followed by human validation, and then analyzes authorship and the surrounding statements. The study reveals a high prevalence of SATD clones (62–95%), widespread similarity of surrounding statements (most >0.8), and a notable share of clones introduced by the original SATD authors, alongside a taxonomy of clone locations, reasons, and purposes. The findings highlight the risk of debt propagation in build processes and lay groundwork for automated SATD repayment and awareness in software engineering practice.
Abstract
Self-Admitted Technical Debt (SATD) annotates development decisions that intentionally exchange long-term software artifact quality for short-term goals. Recent work explores the existence of SATD clones (duplicate or near duplicate SATD comments) in source code. Cloning of SATD in build systems (e.g., CMake and Maven) may propagate suboptimal design choices, threatening qualities of the build system that stakeholders rely upon (e.g., maintainability, reliability, repeatability). Hence, we conduct a large-scale study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant build systems to investigate the prevalence of SATD clones and to characterize their incidences. We observe that: (i) prior work suggests that 41-65% of SATD comments in source code are clones, but in our studied build system context, the rates range from 62% to 95%, suggesting that SATD clones are a more prevalent phenomenon in build systems than in source code; (ii) statements surrounding SATD clones are highly similar, with 76% of occurrences having similarity scores greater than 0.8; (iii) a quarter of SATD clones are introduced by the author of the original SATD statements; and (iv) among the most commonly cloned SATD comments, external factors (e.g., platform and tool configuration) are the most frequent locations, limitations in tools and libraries are the most frequent causes, and developers often copy SATD comments that describe issues to be fixed later. Our work presents the first step toward systematically understanding SATD clones in build systems and opens up avenues for future work, such as distinguishing different SATD clone behavior, as well as designing an automated recommendation system for repaying SATD effectively based on resolved clones.
