SAT-DIFF: A Tree Diffing Framework Using SAT Solving
Chuqin Geng, Haolin Ye, Yihan Zhang, Brigitte Pientka, Xujie Si
TL;DR
This work addresses the challenge of computing minimal, type-safe diffs between tree-structured code by reframing tree diffing as a $MaxSAT$ optimization. It introduces SatDiff, which encodes edge usage and node matching, decodes the solver's solution into low-level edits, and synthesizes concise high-level edits via a dependency graph. Empirically, SatDiff produces significantly more concise edit scripts than state-of-the-art baselines on Python and OCaml data, while maintaining reasonable runtimes and offering formal correctness and minimality guarantees as well as type-safety. The approach demonstrates strong potential for practical integration, extensibility to other languages, and IDE support, with targeted optimizations to control clause explosion and improve scalability.
Abstract
Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a novel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.
