Table of Contents
Fetching ...

HaPy-Bug -- Human Annotated Python Bug Resolution Dataset

Piotr Przymus, Mikołaj Fejzer, Jakub Narębski, Radosław Woźniak, Łukasz Halada, Aleksander Kazecki, Mykhailo Molchanov, Krzysztof Stencel

TL;DR

HaPy-Bug presents a Python bug-fix dataset of 793 commits annotated by three domain experts for file- and line-level aspects, enabling precise bug localization and analysis of bug-fix practices. The annotation workflow combines automatic lexical labeling with expert refinement, achieving strong inter-annotator agreement ($\kappa = 0.83$) and enabling reliable, scalable labeling across CVE-derived and crawled bugs. The dataset reveals distinct patterns across subsets and demonstrates that automated labeling can cover most lines, while human review handles non-trivial cases; it supports bug localization, tooling for repository analysis, and evaluation of LLMs for code understanding and repair. Publicly available, HaPy-Bug provides a valuable resource for software maintenance, security research, and the development of advanced annotation and analysis tools.

Abstract

We present HaPy-Bug, a curated dataset of 793 Python source code commits associated with bug fixes, with each line of code annotated by three domain experts. The annotations offer insights into the purpose of modified files, changes at the line level, and reviewers' confidence levels. We analyze HaPy-Bug to examine the distribution of file purposes, types of modifications, and tangled changes. Additionally, we explore its potential applications in bug tracking, the analysis of bug-fixing practices, and the development of repository analysis tools. HaPy-Bug serves as a valuable resource for advancing research in software maintenance and security.

HaPy-Bug -- Human Annotated Python Bug Resolution Dataset

TL;DR

HaPy-Bug presents a Python bug-fix dataset of 793 commits annotated by three domain experts for file- and line-level aspects, enabling precise bug localization and analysis of bug-fix practices. The annotation workflow combines automatic lexical labeling with expert refinement, achieving strong inter-annotator agreement () and enabling reliable, scalable labeling across CVE-derived and crawled bugs. The dataset reveals distinct patterns across subsets and demonstrates that automated labeling can cover most lines, while human review handles non-trivial cases; it supports bug localization, tooling for repository analysis, and evaluation of LLMs for code understanding and repair. Publicly available, HaPy-Bug provides a valuable resource for software maintenance, security research, and the development of advanced annotation and analysis tools.

Abstract

We present HaPy-Bug, a curated dataset of 793 Python source code commits associated with bug fixes, with each line of code annotated by three domain experts. The annotations offer insights into the purpose of modified files, changes at the line level, and reviewers' confidence levels. We analyze HaPy-Bug to examine the distribution of file purposes, types of modifications, and tangled changes. Additionally, we explore its potential applications in bug tracking, the analysis of bug-fixing practices, and the development of repository analysis tools. HaPy-Bug serves as a valuable resource for advancing research in software maintenance and security.

Paper Structure

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The process of dataset creation
  • Figure 2: Dataset analysis