Table of Contents
Fetching ...

Automated Bug Report Prioritization in Large Open-Source Projects

Riley Pierson, Armin Moin

TL;DR

The paper tackles automated bug prioritization in large open-source projects by combining topic modeling and per-topic text classification. It introduces a two-stage pipeline where LDA-based topic modeling (via a MTM-inspired approach, TopicMiner-MTM) groups bug reports into topics, followed by per-topic classifiers trained with BERT and Naïve Bayes to predict priority levels. Using the Eclipse Platform dataset of 85,156 bug reports, the BERT-based approach outperforms baselines and several state-of-the-art methods across standard metrics, highlighting the value of topic-specific modeling for priority prediction. The work offers an open-source prototype and demonstrates practical impact for improving bug triage efficiency in real-world open-source ecosystems.

Abstract

Large open-source projects receive a large number of issues (known as bugs), including software defect (i.e., bug) reports and new feature requests from their user and developer communities at a fast rate. The often limited project resources do not allow them to deal with all issues. Instead, they have to prioritize them according to the project's priorities and the issues' severities. In this paper, we propose a novel approach to automated bug prioritization based on the natural language text of the bug reports that are stored in the open bug repositories of the issue-tracking systems. We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model to achieve a higher performance level compared to the state-of-the-art. Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.

Automated Bug Report Prioritization in Large Open-Source Projects

TL;DR

The paper tackles automated bug prioritization in large open-source projects by combining topic modeling and per-topic text classification. It introduces a two-stage pipeline where LDA-based topic modeling (via a MTM-inspired approach, TopicMiner-MTM) groups bug reports into topics, followed by per-topic classifiers trained with BERT and Naïve Bayes to predict priority levels. Using the Eclipse Platform dataset of 85,156 bug reports, the BERT-based approach outperforms baselines and several state-of-the-art methods across standard metrics, highlighting the value of topic-specific modeling for priority prediction. The work offers an open-source prototype and demonstrates practical impact for improving bug triage efficiency in real-world open-source ecosystems.

Abstract

Large open-source projects receive a large number of issues (known as bugs), including software defect (i.e., bug) reports and new feature requests from their user and developer communities at a fast rate. The often limited project resources do not allow them to deal with all issues. Instead, they have to prioritize them according to the project's priorities and the issues' severities. In this paper, we propose a novel approach to automated bug prioritization based on the natural language text of the bug reports that are stored in the open bug repositories of the issue-tracking systems. We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model to achieve a higher performance level compared to the state-of-the-art. Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.

Paper Structure

This paper contains 20 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A shortened example of a Bugzilla bug report of the Eclipse WTP Java EE Tools project/product bugreport
  • Figure 2: Our BERT Approach Overview BERT
  • Figure 3: Our Naïve Bayes Approach Overview
  • Figure 4: Priority level distribution of the Eclipse Platform project's bug reports datasethomepageeclipsedataset.
  • Figure 5: Topic Distributions Using LDA