Detection, Classification and Prevalence of Self-Admitted Aging Debt
Murali Sridharan, Mika Mäntylä, Leevi Rantala
TL;DR
This study introduces Self-Admitted Aging Debt (SAAD), a software-aging signal evidenced in developer comments, and develops a taxonomy separating Active and Dormant aging. Using a mixed-methods, sequential exploratory design, the authors extract 145 aging-related text features with Sense2Vec, derive 399 SAAD patterns, and build gold (2,562 comments) and silver (35,630 comments) SAAD datasets from the PENTACET corpus of 9,000+ OSS Java repositories. The prevalence analysis shows Dormant SAAD dominates OSS aging signals, with Deprecation SAAD being the major contributor, and demonstrates strong cross-dataset consistency via statistical triangulation. The work provides a foundational dataset and taxonomy to enable proactive maintenance and cross-language extensions, advancing the understanding of evolutionary software aging beyond traditional Technical Debt analyses.
Abstract
Context: Previous research on software aging is limited with focus on dynamic runtime indicators like memory and performance, often neglecting evolutionary indicators like source code comments and narrowly examining legacy issues within the TD context. Objective: We introduce the concept of Aging Debt (AD), representing the increased maintenance efforts and costs needed to keep software updated. We study AD through Self-Admitted Aging Debt (SAAD) observed in source code comments left by software developers. Method: We employ a mixed-methods approach, combining qualitative and quantitative analyses to detect and measure AD in software. This includes framing SAAD patterns from the source code comments after analysing the source code context, then utilizing the SAAD patterns to detect SAAD comments. In the process, we develop a taxonomy for SAAD that reflects the temporal aging of software and its associated debt. Then we utilize the taxonomy to quantify the different types of AD prevalent in OSS repositories. Results: Our proposed taxonomy categorizes temporal software aging into Active and Dormant types. Our extensive analysis of over 9,000+ Open Source Software (OSS) repositories reveals that more than 21% repositories exhibit signs of SAAD as observed from our gold standard SAAD dataset. Notably, Dormant AD emerges as the predominant category, highlighting a critical but often overlooked aspect of software maintenance. Conclusion: As software volume grows annually, so do evolutionary aging and maintenance challenges; our proposed taxonomy can aid researchers in detailed software aging studies and help practitioners develop improved and proactive maintenance strategies.
