Unravelling Technical debt topics through Time, Programming Languages and Repository
Karthik Shivashankar, Antonio Martini
TL;DR
This study investigates how Technical Debt topics diversify and evolve across time, programming languages, and repositories by analyzing GitHub issues from 2015 to September 2023. It combines BERTopic-based topic modelling with VADER sentiment analysis to identify TD themes and gauge developers' attitudes, additionally linking topics to language/context such as TypeScript and VSCode. Key findings include recurring TD themes like testing, UI components, and refactoring, with observable shifts over time and language-specific patterns, accompanied by nuanced sentiment insights. The work delivers a reproducible framework and practical guidance for TD management in real-world software projects, with potential for broader adoption and extension.
Abstract
This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories. Despite the extensive research on identifying and quantifying TD, there remains a significant gap in understanding the diversity of TD topics and their temporal development. To address this, we have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023. We employed BERTopic for sophisticated topic modelling. This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics. This offers a more nuanced understanding of the trends and shifts in TD topics through time, programming language, and repository.
