Table of Contents
Fetching ...

Evaluation of Version Control Merge Tools

Benedikt Schesch, Ryan Featherman, Kenneth J. Yang, Ben R. Roberts, Michael D. Ernst

TL;DR

This paper addresses the reliability of version control merge tools by introducing a realistic evaluation framework that combines automated testing, non-main-branch merges, and a cost model for incorrect merges. It contrasts a wide range of tools (Git Merge, Hires-Merge, Spork, IntelliMerge, and the authors' IVn family) across Java projects drawn from large code datasets, revealing that performance depends on the relative cost of incorrect merges via the parameter $k$. The study finds that simple augmentations to Git Merge can outperform more complex, structure-based tools in many scenarios, while also highlighting scenarios where some tools excel or fail (e.g., refactorings, adjacent-line edits, and whitespace handling). The results advocate for principled, data-driven evaluation and suggest that future work should balance correctness, developer time, and practical ease of implementation when designing merge tools.

Abstract

A version control system, such as Git, requires a way to integrate changes from different developers or branches. Given a merge scenario, a merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution. A clean integration is correct if it preserves intended program behavior, and is incorrect otherwise (e.g., if it causes a test failure). Manual resolution consumes valuable developer time, and correcting a defect introduced by an incorrect merge is even more costly. New merge tools have been proposed, but they have not yet been evaluated against one another. Prior evaluations do not properly distinguish between correct and incorrect merges, are not evaluated on a realistic set of merge scenarios, and/or do not compare to state-of-the-art tools. We have performed a more realistic evaluation. The results differ significantly from previous claims, setting the record straight and enabling better future research. Our novel experimental methodology combines running test suites, examining merges on deleted branches, and accounting for the cost of incorrect merges. Based on these evaluations, we created a merge tool that out-performs all previous tools under most assumptions. It handles the most common merge scenarios in practice.

Evaluation of Version Control Merge Tools

TL;DR

This paper addresses the reliability of version control merge tools by introducing a realistic evaluation framework that combines automated testing, non-main-branch merges, and a cost model for incorrect merges. It contrasts a wide range of tools (Git Merge, Hires-Merge, Spork, IntelliMerge, and the authors' IVn family) across Java projects drawn from large code datasets, revealing that performance depends on the relative cost of incorrect merges via the parameter . The study finds that simple augmentations to Git Merge can outperform more complex, structure-based tools in many scenarios, while also highlighting scenarios where some tools excel or fail (e.g., refactorings, adjacent-line edits, and whitespace handling). The results advocate for principled, data-driven evaluation and suggest that future work should balance correctness, developer time, and practical ease of implementation when designing merge tools.

Abstract

A version control system, such as Git, requires a way to integrate changes from different developers or branches. Given a merge scenario, a merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution. A clean integration is correct if it preserves intended program behavior, and is incorrect otherwise (e.g., if it causes a test failure). Manual resolution consumes valuable developer time, and correcting a defect introduced by an incorrect merge is even more costly. New merge tools have been proposed, but they have not yet been evaluated against one another. Prior evaluations do not properly distinguish between correct and incorrect merges, are not evaluated on a realistic set of merge scenarios, and/or do not compare to state-of-the-art tools. We have performed a more realistic evaluation. The results differ significantly from previous claims, setting the record straight and enabling better future research. Our novel experimental methodology combines running test suites, examining merges on deleted branches, and accounting for the cost of incorrect merges. Based on these evaluations, we created a merge tool that out-performs all previous tools under most assumptions. It handles the most common merge scenarios in practice.

Paper Structure

This paper contains 55 sections, 1 equation, 10 figures.

Figures (10)

  • Figure 1: Mergeable changes that line-based merge reports as a conflict.
  • Figure 2: Conflicting changes that line-based merge cleanly, but incorrectly, merges. Most previous evaluations count this as a successful merge.
  • Figure 3: Command-line arguments to git merge.
  • Figure 4: Repositories and merges at each phase of collection for our data sets. "$\cap$ #f" gives the median number of files changed by both parent 1 and parent 2. The "total" lines are medians for the changes in parent 1 or parent 2. "Imp" is the percentage of merges that involve Java import statements.
  • Figure 5: Performance of different Git Merge configurations. \ref{['fig:costplot-git']} visualizes this data.
  • ...and 5 more figures