Table of Contents
Fetching ...

Prioritising GitHub Priority Labels

James Caddy, Christoph Treude

TL;DR

This paper tackles the lack of standardisation in GitHub issue priority labels by introducing a hand-curated dataset of 812 priority-related labels categorized into High, Medium, and Low. It normalizes disparate label scales, collects data from the 5,000 most-starred repositories, and evaluates inter-rater reliability to support ranking decisions, ultimately releasing both the dataset and a starter tool on Zenodo. The contributions enable cross-repo prioritisation, facilitate research on priority prediction, and provide a practical workflow for contributors to surface high-priority issues. The work also identifies limitations, including English-language scope and inter-repository comparability, and suggests expanding depth and breadth of priority labelling in future work.

Abstract

Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.

Prioritising GitHub Priority Labels

TL;DR

This paper tackles the lack of standardisation in GitHub issue priority labels by introducing a hand-curated dataset of 812 priority-related labels categorized into High, Medium, and Low. It normalizes disparate label scales, collects data from the 5,000 most-starred repositories, and evaluates inter-rater reliability to support ranking decisions, ultimately releasing both the dataset and a starter tool on Zenodo. The contributions enable cross-repo prioritisation, facilitate research on priority prediction, and provide a practical workflow for contributors to surface high-priority issues. The work also identifies limitations, including English-language scope and inter-repository comparability, and suggests expanding depth and breadth of priority labelling in future work.

Abstract

Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.
Paper Structure (7 sections, 2 figures, 5 tables)

This paper contains 7 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Normalised Mapping of Different Scales
  • Figure 2: Bar Chart of Rating Frequencies