Tag that issue: Applying API-domain labels in issue tracking systems

Fabio Santos; Joseph Vargovich; Bianca Trinkenreich; Italo Santos; Jacob Penney; Ricardo Britto; João Felipe Pimentel; Igor Wiese; Igor Steinmacher; Anita Sarma; Marco A. Gerosa

Tag that issue: Applying API-domain labels in issue tracking systems

Fabio Santos, Joseph Vargovich, Bianca Trinkenreich, Italo Santos, Jacob Penney, Ricardo Britto, João Felipe Pimentel, Igor Wiese, Igor Steinmacher, Anita Sarma, Marco A. Gerosa

TL;DR

The paper investigates automatically labeling OSS issues with API-domain categories to signal the skills required for solving them. It combines a user study showing API-domain labels aid newcomers in task selection with a scalable multi-label prediction pipeline (TF-IDF and BERT) trained on 22,231 issues from five projects and 31 API-domain labels. The results demonstrate high per-project precision (around 0.86) and substantial recall (around 0.79) for predicting API-domain labels, with transfer learning showing more variable performance but potential for cross-project applicability. Developer validation indicates most predicted labels align with the needed skills, supporting practical deployment to guide contributors and maintenance planning. Overall, API-domain labeling offers a promising path to assist onboarding and task allocation in OSS, albeit with considerations for label overload and transfer-learning dynamics.

Abstract

Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.

Tag that issue: Applying API-domain labels in issue tracking systems

TL;DR

Abstract

Paper Structure (26 sections, 5 equations, 14 figures, 26 tables)

This paper contains 26 sections, 5 equations, 14 figures, 26 tables.

Introduction
Related Work
Method Overview
Relevance of the Labels to New Contributors (RQ1)
Method
Participants
Experiment Planning
Questionnaire Data Collection
Questionnaire Data Analysis
Results
Label Predictions (RQ2)
Method
Phase 1 - Mining Software Repositories
Phase 2 - API classification
Phase 3 - Building the Multi-label Classifiers
...and 11 more sections

Figures (14)

Figure 1: Research method overview
Figure 2: Questionnaire question about the relevance of the page regions for task selection
Figure 3: The region counts (normalized) of the issue's information page selected as most relevant by participants from treatment and control groups.
Figure 4: The Y-Axis contains the density probability and the median of API-domain labels (API) x Component labels (Comp) x Type labels
Figure 5: The information reported by contributors as relevant to choosing a task. We mapped the categories of our participants' definitions (rounded squares) to the 5W2H framework klock20165w2h, which organizes information for decision-making across seven questions.
...and 9 more figures

Tag that issue: Applying API-domain labels in issue tracking systems

TL;DR

Abstract

Tag that issue: Applying API-domain labels in issue tracking systems

Authors

TL;DR

Abstract

Table of Contents

Figures (14)