Table of Contents
Fetching ...

Different Debt: An Addition to the Technical Debt Dataset and a Demonstration Using Developer Personality

Lorenz Graf-Vlachy, Stefan Wagner

TL;DR

This paper tackles the incompleteness of TD data in the Technical Debt Dataset due to SonarQube and build failures by introducing an addition to the TDD using Teamscale to cover essentially all commits across 37 Java projects up to October 2023. It provides the construction details, the repository of per-commit data, and CSV outputs (report, findings, findings_messages) linked to the TDD, enabling reproducible, cross-branch TD analyses. The authors then demonstrate the utility by replicating and expanding a developer personality–TD study, analyzing 5,497 commits from 111 developers with panel regression, finding that many personality-TD associations differ from prior work, though age at commit remains a consistent predictor. The work yields a fine-grained, extensible dataset and demonstrates the importance of large, diverse samples and alternative TD metrics for drawing robust conclusions about how developer characteristics relate to TD, with practical implications for research reproducibility and TD management.

Abstract

Background: The "Technical Debt Dataset" (TDD) is a comprehensive dataset on technical debt (TD) in the main branches of more than 30 Java projects. However, some TD items produced by SonarQube are not included for many commits, for instance because the commits failed to compile. This has limited previous studies using the dataset. Aims and Method: In this paper, we provide an addition to the dataset that includes an analysis of 278,320 commits of all branches in a superset of 37 projects using Teamscale. We then demonstrate the utility of the dataset by exploring the relationship between developer personality by replicating a prior study. Results: The new dataset allows us to use a larger sample than prior work could, and we analyze the personality of 111 developers and 5,497 of their commits. The relationships we find between developer personality and the introduction and removal of TD differ from those found in prior work. Conclusions: We offer a dataset that may enable future studies into the topic of TD and we provide additional insights on how developer personality relates to TD.

Different Debt: An Addition to the Technical Debt Dataset and a Demonstration Using Developer Personality

TL;DR

This paper tackles the incompleteness of TD data in the Technical Debt Dataset due to SonarQube and build failures by introducing an addition to the TDD using Teamscale to cover essentially all commits across 37 Java projects up to October 2023. It provides the construction details, the repository of per-commit data, and CSV outputs (report, findings, findings_messages) linked to the TDD, enabling reproducible, cross-branch TD analyses. The authors then demonstrate the utility by replicating and expanding a developer personality–TD study, analyzing 5,497 commits from 111 developers with panel regression, finding that many personality-TD associations differ from prior work, though age at commit remains a consistent predictor. The work yields a fine-grained, extensible dataset and demonstrates the importance of large, diverse samples and alternative TD metrics for drawing robust conclusions about how developer characteristics relate to TD, with practical implications for research reproducibility and TD management.

Abstract

Background: The "Technical Debt Dataset" (TDD) is a comprehensive dataset on technical debt (TD) in the main branches of more than 30 Java projects. However, some TD items produced by SonarQube are not included for many commits, for instance because the commits failed to compile. This has limited previous studies using the dataset. Aims and Method: In this paper, we provide an addition to the dataset that includes an analysis of 278,320 commits of all branches in a superset of 37 projects using Teamscale. We then demonstrate the utility of the dataset by exploring the relationship between developer personality by replicating a prior study. Results: The new dataset allows us to use a larger sample than prior work could, and we analyze the personality of 111 developers and 5,497 of their commits. The relationships we find between developer personality and the introduction and removal of TD differ from those found in prior work. Conclusions: We offer a dataset that may enable future studies into the topic of TD and we provide additional insights on how developer personality relates to TD.
Paper Structure (20 sections, 2 tables)