SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Chuqin Geng; Haolin Ye; Yihan Zhang; Brigitte Pientka; Xujie Si

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Chuqin Geng, Haolin Ye, Yihan Zhang, Brigitte Pientka, Xujie Si

TL;DR

This work addresses the challenge of computing minimal, type-safe diffs between tree-structured code by reframing tree diffing as a $MaxSAT$ optimization. It introduces SatDiff, which encodes edge usage and node matching, decodes the solver's solution into low-level edits, and synthesizes concise high-level edits via a dependency graph. Empirically, SatDiff produces significantly more concise edit scripts than state-of-the-art baselines on Python and OCaml data, while maintaining reasonable runtimes and offering formal correctness and minimality guarantees as well as type-safety. The approach demonstrates strong potential for practical integration, extensibility to other languages, and IDE support, with targeted optimizations to control clause explosion and improve scalability.

Abstract

Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a novel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

TL;DR

This work addresses the challenge of computing minimal, type-safe diffs between tree-structured code by reframing tree diffing as a

optimization. It introduces SatDiff, which encodes edge usage and node matching, decodes the solver's solution into low-level edits, and synthesizes concise high-level edits via a dependency graph. Empirically, SatDiff produces significantly more concise edit scripts than state-of-the-art baselines on Python and OCaml data, while maintaining reasonable runtimes and offering formal correctness and minimality guarantees as well as type-safety. The approach demonstrates strong potential for practical integration, extensibility to other languages, and IDE support, with targeted optimizations to control clause explosion and improve scalability.

Abstract

Paper Structure (26 sections, 9 theorems, 14 equations, 8 figures, 2 tables)

This paper contains 26 sections, 9 theorems, 14 equations, 8 figures, 2 tables.

Introduction
Preliminary -- The MaxSAT problem
Overview of SatDiff
Methodology
The encoding phase --- Generating variables and constraints to encode tree diffing
Allocating variables to encode the usage of edges and matches
Encoding the tree diffing problem using hard and soft constraints
Interpret the solution to $\Phi_{{\textsc{SatDiff}}}$ as edit effect.
The decoding phase --- Decoding low-level edit scripts from MaxSAT solution
The synthesis phase --- Merging high-level edit actions from low-level edit actions
Extensions
Supporting type safety
Supporting flexible edits
Experimental Evaluation
Datasets.
...and 11 more sections

Key Result

Lemma 1

There exists at least one edit effect $\mathcal{E}$ that applies to the source space $S':=(N_{S'}, E_S)$, resulting in a tree $T_\mathcal{E}$ such that $T_\mathcal{E}$ is isomorphic to the target tree $T$.

Figures (8)

Figure 1: Overview of the SatDiff Algorithm.
Figure 2: High-level Operations.
Figure 3: Low-level edit actions execute edge and node operations described in the edit effect.
Figure 4: Dependency graph of low-level edit actions. The dashed regions contain a group of low-level edit actions that can merged to high-level edit actions.
Figure 5: Absolute differences and relative differences in edit script sizes between different baselines and $\textsc{SatDiff}$. In Fig (a) we calculate the difference between the edit script size of the baseline approaches and SatDiff on the Python dataset while in Fig (b) we calculate their ratio. In Fig (c) we present both the difference and ratio on the OCaml dataset.
...and 3 more figures

Theorems & Definitions (34)

Definition 4.1: Virtual nodes and the source space
Remark 1: Optimization on virtual nodes
Definition 4.2: Edit effect
Remark 2
Lemma 1
Definition 4.3: The edge variable
Remark 3
Definition 4.4: The match variable
Remark 4
Definition 4.5: Potential edges
...and 24 more

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

TL;DR

Abstract

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (34)