Table of Contents
Fetching ...

Unravelling Technical debt topics through Time, Programming Languages and Repository

Karthik Shivashankar, Antonio Martini

TL;DR

This study investigates how Technical Debt topics diversify and evolve across time, programming languages, and repositories by analyzing GitHub issues from 2015 to September 2023. It combines BERTopic-based topic modelling with VADER sentiment analysis to identify TD themes and gauge developers' attitudes, additionally linking topics to language/context such as TypeScript and VSCode. Key findings include recurring TD themes like testing, UI components, and refactoring, with observable shifts over time and language-specific patterns, accompanied by nuanced sentiment insights. The work delivers a reproducible framework and practical guidance for TD management in real-world software projects, with potential for broader adoption and extension.

Abstract

This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories. Despite the extensive research on identifying and quantifying TD, there remains a significant gap in understanding the diversity of TD topics and their temporal development. To address this, we have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023. We employed BERTopic for sophisticated topic modelling. This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics. This offers a more nuanced understanding of the trends and shifts in TD topics through time, programming language, and repository.

Unravelling Technical debt topics through Time, Programming Languages and Repository

TL;DR

This study investigates how Technical Debt topics diversify and evolve across time, programming languages, and repositories by analyzing GitHub issues from 2015 to September 2023. It combines BERTopic-based topic modelling with VADER sentiment analysis to identify TD themes and gauge developers' attitudes, additionally linking topics to language/context such as TypeScript and VSCode. Key findings include recurring TD themes like testing, UI components, and refactoring, with observable shifts over time and language-specific patterns, accompanied by nuanced sentiment insights. The work delivers a reproducible framework and practical guidance for TD management in real-world software projects, with potential for broader adoption and extension.

Abstract

This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories. Despite the extensive research on identifying and quantifying TD, there remains a significant gap in understanding the diversity of TD topics and their temporal development. To address this, we have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023. We employed BERTopic for sophisticated topic modelling. This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics. This offers a more nuanced understanding of the trends and shifts in TD topics through time, programming language, and repository.

Paper Structure

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Main Topics from the entire dataset and its evolution with respect to time
  • Figure 2: Main Topics from the filtered TypeScript programming language and its evolution with respect to time
  • Figure 3: Main Topics from the filtered VSCode repository and its evolution with respect to time
  • Figure 4: VADER Compound Sentiment Score for the Entire Dataset with respect to each top Topics
  • Figure 5: VADER Compound Sentiment Score for the filtered TypeScript dataset with respect to each top Topics