Can I Solve It? Identifying APIs Required to Complete OSS Task
Fabio Santos, Igor Wiese, Bianca Trinkenreich, Igor Steinmacher, Anita Sarma, Marco Gerosa
TL;DR
This study addresses the challenge of guiding OSS contributors to suitable tasks by automatically labeling issues with API-domain domains. It presents a three-phase methodology: mining JabRef to build ground-truth API-domain labels, constructing and evaluating multi-label TF-IDF-based classifiers (with Random Forest performing best), and conducting a developer study to assess label relevance. Results show the classifier can predict API-domain labels with precision around 0.76 and recall around 0.75, and that API-domain labels significantly increase perceived usefulness for task selection, especially among industry practitioners and experienced developers. The work demonstrates practical potential for automating skill-directed task matching and outlines replication data and future directions, including broader project validation and richer embedding-based techniques.
Abstract
Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
