Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software

Eric L. Melin; Ahmed Musa Awon; Nasir U. Eisty; Neil A. Ernst; Shurui Zhou

Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software

Eric L. Melin, Ahmed Musa Awon, Nasir U. Eisty, Neil A. Ernst, Shurui Zhou

TL;DR

This paper investigates Self-Admitted Technical Debt (SATD) in Scientific Software (SSW) and its distinct patterns compared with general-purpose OSS. It introduces a dual-dataset approach combining SATDAUG with a Scientific Debt dataset, creating a merged corpus of 67,066 labeled comments and enabling fine-tuning of ten transformer models (including BERT, CodeBERT, RoBERTa, Mistral, Llama-2, DeepSeek-Qwen, and T5) to identify and classify SATD types. The study finds SSW contains a 9.25x higher prevalence of Scientific Debt and 4.93x more SATD overall than general OSS, underscoring domain-specific maintenance challenges, while a best-performing model (BERT-large for intra-project and Llama-2-7B for cross-project) achieves state-of-the-art detection performance. Application of the model to 27 repositories confirms substantial SATD in SSW, with Scientific Debt dominating in SS domains, thus highlighting the need for domain-aware SATD management tools to preserve scientific validity and reproducibility. The work provides a benchmark dataset, demonstrates the effectiveness of transformer-based SATD detection in SSW, and points to future work on expanding datasets and longitudinal analyses to track SATD evolution over time in scientific computing.

Abstract

Developers often leave behind clues in their code, admitting where it falls short, known as Self-Admitted Technical Debt (SATD). In the world of Scientific Software (SSW), where innovation moves fast and collaboration is key, such debt is not just common but deeply impactful. As research relies on accurate and reproducible results, accumulating SATD can threaten the very foundations of scientific discovery. Yet, despite its significance, the relationship between SATD and SSW remains largely unexplored, leaving a crucial gap in understanding how to manage SATD in this critical domain. This study explores SATD in SSW repositories, comparing SATD in scientific versus general-purpose open-source software and evaluating transformer-based models for SATD identification. We analyzed SATD in 27 scientific and general-purpose repositories across multiple domains and languages. We fine-tuned and compared 10 transformer-based models (100M-7B parameters) on 67,066 labeled code comments. SSW contains 9.25x more Scientific Debt and 4.93x more SATD than general-purpose software due to complex computations, domain constraints, and evolving research needs. Furthermore, our best model outperforms existing ones. This study uncovers how SATD in SSW differs from general software, revealing its impact on quality and scientific validity. By recognizing these challenges, developers and researchers can adopt smarter strategies to manage debt and safeguard the integrity of scientific discovery.

Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software

TL;DR

Abstract

Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)