Table of Contents
Fetching ...

QualiTagger: Automating software quality detection in issue trackers

Karthik Shivashankar, Rafael Capilla, Maren Maritsdatter Kruke, Mili Orucevic, Antonio Martini

TL;DR

This paper presents QualiTagger, a transformer-based approach to automatically identify software quality attributes in issue-tracker text, underpinned by a large curated dataset, QualiDataSet, mined from thousands of GitHub projects. The authors implement an ensemble of binary DistilRoBERTa classifiers and compare binary versus multiclass setups, as well as against large language models like GPT-4o, demonstrating strong performance and robust out-of-distribution generalization across seven quality attributes aligned to the SQuaRE ISO standard. They validate the method through rigorous case studies, including industry data from Visma and a software-engineering student project, and show practical applicability for prioritizing technical debt and guiding quality-focused decisions. The work provides a scalable, domain-adaptive tool for automatic QA tagging in issue trackers, offering valuable insights for researchers and practitioners and enabling large-scale studies of quality dynamics across repositories and programming languages.

Abstract

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of these qualities in software requirements described in natural language has been investigated, most of the results are often not applicable in practice, and rely on having been validated on small datasets and limited amount of projects. For many years, machine learning (ML) techniques have been proved as a valid technique to identify and tag terms described in natural language. In order to advance previous works, in this research we use cutting edge models like Transformers, together with a vast dataset mined and curated from GitHub, to identify what text is usually associated with different quality properties. We also study the distribution of such qualities in issue trackers from openly accessible software repositories, and we evaluate our approach both with students from a software engineering course and with its application to recognize security labels in industry.

QualiTagger: Automating software quality detection in issue trackers

TL;DR

This paper presents QualiTagger, a transformer-based approach to automatically identify software quality attributes in issue-tracker text, underpinned by a large curated dataset, QualiDataSet, mined from thousands of GitHub projects. The authors implement an ensemble of binary DistilRoBERTa classifiers and compare binary versus multiclass setups, as well as against large language models like GPT-4o, demonstrating strong performance and robust out-of-distribution generalization across seven quality attributes aligned to the SQuaRE ISO standard. They validate the method through rigorous case studies, including industry data from Visma and a software-engineering student project, and show practical applicability for prioritizing technical debt and guiding quality-focused decisions. The work provides a scalable, domain-adaptive tool for automatic QA tagging in issue trackers, offering valuable insights for researchers and practitioners and enabling large-scale studies of quality dynamics across repositories and programming languages.

Abstract

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of these qualities in software requirements described in natural language has been investigated, most of the results are often not applicable in practice, and rely on having been validated on small datasets and limited amount of projects. For many years, machine learning (ML) techniques have been proved as a valid technique to identify and tag terms described in natural language. In order to advance previous works, in this research we use cutting edge models like Transformers, together with a vast dataset mined and curated from GitHub, to identify what text is usually associated with different quality properties. We also study the distribution of such qualities in issue trackers from openly accessible software repositories, and we evaluate our approach both with students from a software engineering course and with its application to recognize security labels in industry.

Paper Structure

This paper contains 35 sections, 4 equations, 1 figure, 17 tables.

Figures (1)

  • Figure 1: Research Process showing sources, steps, results related to various RQs, novel contribitions (black boxes) and finally both usage showcase of our tool and evaluation in practice