Table of Contents
Fetching ...

SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub

Benjamin C. Carter, Jonathan Rivas Contreras, Carlos A. Llanes Villegas, Pawan Acharya, Jack Utzerath, Adonijah O. Farner, Hunter Jenkins, Dylan Johnson, Jacob Penney, Igor Steinmacher, Marco A. Gerosa, Fabio Santos

TL;DR

SkillScope tackles onboarding bottlenecks in OSS by predicting fine-grained, multilevel skill labels (domains and subdomains) for GitHub issues. It combines AST-based code analysis of Java with RF and fine-tuned GPT models to label issues across up to 217 domains, achieving high precision and recall (e.g., up to 0.908/0.876 and 0.889 F1) and outperforming LLM-only approaches. The tool provides a Django-based UI and replication packages, enabling practitioners to better assign tasks and accelerate onboarding in OSS projects. By extending prior API-domain labeling to a broader, multilevel taxonomy, SkillScope offers more actionable guidance for contributors and maintainers alike, with clear paths for future expansion and explainability enhancements.

Abstract

New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.

SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub

TL;DR

SkillScope tackles onboarding bottlenecks in OSS by predicting fine-grained, multilevel skill labels (domains and subdomains) for GitHub issues. It combines AST-based code analysis of Java with RF and fine-tuned GPT models to label issues across up to 217 domains, achieving high precision and recall (e.g., up to 0.908/0.876 and 0.889 F1) and outperforming LLM-only approaches. The tool provides a Django-based UI and replication packages, enabling practitioners to better assign tasks and accelerate onboarding in OSS projects. By extending prior API-domain labeling to a broader, multilevel taxonomy, SkillScope offers more actionable guidance for contributors and maintainers alike, with clear paths for future expansion and explainability enhancements.

Abstract

New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.

Paper Structure

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: SkillScope Architecture
  • Figure 2: Evaluation