Table of Contents
Fetching ...

Refactoring-aware Block Tracking in Commit History

Mohammed Tayeeb Hasan, Nikolaos Tsantalis, Pouria Alikhanifard

TL;DR

The paper addresses the need for fine-grained, refactoring-aware tracking of code blocks in commit histories, focusing on imperative object-oriented Java. It introduces CodeTracker 2.0, which uses RefactoringMiner 3.0 to map statements and builds a change-history graph that supports forks and cross-type transformations. An extended block-change oracle (1,280 blocks across 200 methods in 20 repos) demonstrates CodeTracker achieving a change-level precision/recall of $99.5\%$, and commit-level precision/recall around $99.8\%$, outperforming a GumTree-based baseline. The system provides a Java API and Chrome extension for visualizing block histories on GitHub, enabling developers and researchers to understand block evolution, migrations, and refactorings with practical performance (median ~2s; average ~3.6s) and broad applicability to software maintenance and evolution studies.

Abstract

Tracking statements in the commit history of a project is in many cases useful for supporting various software maintenance, comprehension, and evolution tasks. A high level of accuracy can facilitate the adoption of code tracking tools by developers and researchers. To this end, we propose CodeTracker, a refactoring-aware tool that can generate the commit change history for code blocks. To evaluate its accuracy, we created an oracle with the change history of 1,280 code blocks found within 200 methods from 20 popular open-source project repositories. Moreover, we created a baseline based on the current state-of-the-art Abstract Syntax Tree diff tool, namely GumTree 3.0, in order to compare the accuracy and execution time. Our experiments have shown that CodeTracker has a considerably higher precision/recall and faster execution time than the GumTree-based baseline, and can extract the complete change history of a code block with a precision and recall of 99.5% within 3.6 seconds on average.

Refactoring-aware Block Tracking in Commit History

TL;DR

The paper addresses the need for fine-grained, refactoring-aware tracking of code blocks in commit histories, focusing on imperative object-oriented Java. It introduces CodeTracker 2.0, which uses RefactoringMiner 3.0 to map statements and builds a change-history graph that supports forks and cross-type transformations. An extended block-change oracle (1,280 blocks across 200 methods in 20 repos) demonstrates CodeTracker achieving a change-level precision/recall of , and commit-level precision/recall around , outperforming a GumTree-based baseline. The system provides a Java API and Chrome extension for visualizing block histories on GitHub, enabling developers and researchers to understand block evolution, migrations, and refactorings with practical performance (median ~2s; average ~3.6s) and broad applicability to software maintenance and evolution studies.

Abstract

Tracking statements in the commit history of a project is in many cases useful for supporting various software maintenance, comprehension, and evolution tasks. A high level of accuracy can facilitate the adoption of code tracking tools by developers and researchers. To this end, we propose CodeTracker, a refactoring-aware tool that can generate the commit change history for code blocks. To evaluate its accuracy, we created an oracle with the change history of 1,280 code blocks found within 200 methods from 20 popular open-source project repositories. Moreover, we created a baseline based on the current state-of-the-art Abstract Syntax Tree diff tool, namely GumTree 3.0, in order to compare the accuracy and execution time. Our experiments have shown that CodeTracker has a considerably higher precision/recall and faster execution time than the GumTree-based baseline, and can extract the complete change history of a code block with a precision and recall of 99.5% within 3.6 seconds on average.
Paper Structure (21 sections, 7 equations, 13 figures, 5 tables)

This paper contains 21 sections, 7 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Hierarchy of supported change kinds for code blocks.
  • Figure 2: Overview of the block tracking process steps.
  • Figure 3: Fluent API for block tracking
  • Figure 4: CodeTracker Chrome browser extension visualizing the change history for a selected program element.
  • Figure 5: Hovering over a node provides more semantic information about the changes that occurred on the tracked code element.
  • ...and 8 more figures