Table of Contents
Fetching ...

WIA-SZZ: Work Item Aware SZZ

Salomé Perez-Rosero, Robert Dyer, Samuel W. Flint, Shane McIntosh, Witawas Srisa-an

TL;DR

A heuristic is proposed that, given an input commit, uses information about changed methods to identify related commits that form a work item with the input commit, and it is hypothesize that given such a work item identifying heuristic, it can identify bug-inducing commits more accurately than existing SZZ approaches.

Abstract

Many software engineering maintenance tasks require linking a commit that induced a bug with the commit that later fixed that bug. Several existing SZZ algorithms provide a way to identify the potential commit that induced a bug when given a fixing commit as input. Prior work introduced the notion of a "work item", a logical grouping of commits that could be a single unit of work. Our key insight in this work is to recognize that a bug-inducing commit and the fix(es) for that bug together represent a "work item." It is not currently understood how these work items, which are logical groups of revisions addressing a single issue or feature, could impact the performance of algorithms such as SZZ. In this paper, we propose a heuristic that, given an input commit, uses information about changed methods to identify related commits that form a work item with the input commit. We hypothesize that given such a work item identifying heuristic, we can identify bug-inducing commits more accurately than existing SZZ approaches. We then build a new variant of SZZ that we call Work Item Aware SZZ (WIA-SZZ), that leverages our work item detecting heuristic to first suggest bug-inducing commits. If our heuristic fails to find any candidates, we then fall back to baseline variants of SZZ. We conduct a manual evaluation to assess the accuracy of our heuristic to identify work items. Our evaluation reveals the heuristic is 64% accurate in finding work items, but most importantly it is able to find many bug-inducing commits. We then evaluate our approach on 821 repositories that have been previously used to study the performance of SZZ, comparing our work against six SZZ variants. That evaluation shows an improvement in F1 scores ranging from 2% to 9%, or when looking only at the subset of cases that found work item improved 3% to 14%.

WIA-SZZ: Work Item Aware SZZ

TL;DR

A heuristic is proposed that, given an input commit, uses information about changed methods to identify related commits that form a work item with the input commit, and it is hypothesize that given such a work item identifying heuristic, it can identify bug-inducing commits more accurately than existing SZZ approaches.

Abstract

Many software engineering maintenance tasks require linking a commit that induced a bug with the commit that later fixed that bug. Several existing SZZ algorithms provide a way to identify the potential commit that induced a bug when given a fixing commit as input. Prior work introduced the notion of a "work item", a logical grouping of commits that could be a single unit of work. Our key insight in this work is to recognize that a bug-inducing commit and the fix(es) for that bug together represent a "work item." It is not currently understood how these work items, which are logical groups of revisions addressing a single issue or feature, could impact the performance of algorithms such as SZZ. In this paper, we propose a heuristic that, given an input commit, uses information about changed methods to identify related commits that form a work item with the input commit. We hypothesize that given such a work item identifying heuristic, we can identify bug-inducing commits more accurately than existing SZZ approaches. We then build a new variant of SZZ that we call Work Item Aware SZZ (WIA-SZZ), that leverages our work item detecting heuristic to first suggest bug-inducing commits. If our heuristic fails to find any candidates, we then fall back to baseline variants of SZZ. We conduct a manual evaluation to assess the accuracy of our heuristic to identify work items. Our evaluation reveals the heuristic is 64% accurate in finding work items, but most importantly it is able to find many bug-inducing commits. We then evaluate our approach on 821 repositories that have been previously used to study the performance of SZZ, comparing our work against six SZZ variants. That evaluation shows an improvement in F1 scores ranging from 2% to 9%, or when looking only at the subset of cases that found work item improved 3% to 14%.

Paper Structure

This paper contains 29 sections, 6 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of SZZ szz, where the lines changed at the fix commit (FC) are annotated with a tool such as Git blame to show the previous commits that modified those lines. In this example, FC modified three lines and those lines were most recently modified in commits c, f, and g. If an issue report is provided, the date of that report can create a filter that removes anything in the checkered region (in this case, commit c). Thus, FC implicates commits f and g as possibly inducing the bug.
  • Figure 2: Mining for work items. In general, the time window is a parameter specific to the algorithm using the discovered work items. For SZZ, we set the time window based on the dates of the fix commit (FC) and 30 days before the issue report.
  • Figure 3: Overview of the WIA-SZZ algorithm. An instance with at least one work item before the issue report date is processed in Part A using our work item heuristic, otherwise the underlying SZZ processes the instance in Part B.
  • Figure 4: WIA-SZZ with Issue Filter (green bars) and without Issue Filter (orange bars). Issue Filter provides WIA-SZZ with context to set up a time window with respect to the issue report date and look for related work items.
  • Figure 5: SZZ evaluation dataset metrics with No Filter (dark yellow bars), Issue Filter (green bars), One Commit Filter (grey bars), and Issue + One Commit Filters (orchid bars). Note that other than the B variant, the filters do not significantly improve $F_1$ scores.