Evaluation of Version Control Merge Tools
Benedikt Schesch, Ryan Featherman, Kenneth J. Yang, Ben R. Roberts, Michael D. Ernst
TL;DR
This paper addresses the reliability of version control merge tools by introducing a realistic evaluation framework that combines automated testing, non-main-branch merges, and a cost model for incorrect merges. It contrasts a wide range of tools (Git Merge, Hires-Merge, Spork, IntelliMerge, and the authors' IVn family) across Java projects drawn from large code datasets, revealing that performance depends on the relative cost of incorrect merges via the parameter $k$. The study finds that simple augmentations to Git Merge can outperform more complex, structure-based tools in many scenarios, while also highlighting scenarios where some tools excel or fail (e.g., refactorings, adjacent-line edits, and whitespace handling). The results advocate for principled, data-driven evaluation and suggest that future work should balance correctness, developer time, and practical ease of implementation when designing merge tools.
Abstract
A version control system, such as Git, requires a way to integrate changes from different developers or branches. Given a merge scenario, a merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution. A clean integration is correct if it preserves intended program behavior, and is incorrect otherwise (e.g., if it causes a test failure). Manual resolution consumes valuable developer time, and correcting a defect introduced by an incorrect merge is even more costly. New merge tools have been proposed, but they have not yet been evaluated against one another. Prior evaluations do not properly distinguish between correct and incorrect merges, are not evaluated on a realistic set of merge scenarios, and/or do not compare to state-of-the-art tools. We have performed a more realistic evaluation. The results differ significantly from previous claims, setting the record straight and enabling better future research. Our novel experimental methodology combines running test suites, examining merges on deleted branches, and accounting for the cost of incorrect merges. Based on these evaluations, we created a merge tool that out-performs all previous tools under most assumptions. It handles the most common merge scenarios in practice.
