Table of Contents
Fetching ...

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Chuqin Geng, Haolin Ye, Yihan Zhang, Brigitte Pientka, Xujie Si

TL;DR

This work addresses the challenge of computing minimal, type-safe diffs between tree-structured code by reframing tree diffing as a $MaxSAT$ optimization. It introduces SatDiff, which encodes edge usage and node matching, decodes the solver's solution into low-level edits, and synthesizes concise high-level edits via a dependency graph. Empirically, SatDiff produces significantly more concise edit scripts than state-of-the-art baselines on Python and OCaml data, while maintaining reasonable runtimes and offering formal correctness and minimality guarantees as well as type-safety. The approach demonstrates strong potential for practical integration, extensibility to other languages, and IDE support, with targeted optimizations to control clause explosion and improve scalability.

Abstract

Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a novel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

TL;DR

This work addresses the challenge of computing minimal, type-safe diffs between tree-structured code by reframing tree diffing as a optimization. It introduces SatDiff, which encodes edge usage and node matching, decodes the solver's solution into low-level edits, and synthesizes concise high-level edits via a dependency graph. Empirically, SatDiff produces significantly more concise edit scripts than state-of-the-art baselines on Python and OCaml data, while maintaining reasonable runtimes and offering formal correctness and minimality guarantees as well as type-safety. The approach demonstrates strong potential for practical integration, extensibility to other languages, and IDE support, with targeted optimizations to control clause explosion and improve scalability.

Abstract

Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a novel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.
Paper Structure (26 sections, 9 theorems, 14 equations, 8 figures, 2 tables)

This paper contains 26 sections, 9 theorems, 14 equations, 8 figures, 2 tables.

Key Result

Lemma 1

There exists at least one edit effect $\mathcal{E}$ that applies to the source space $S':=(N_{S'}, E_S)$, resulting in a tree $T_\mathcal{E}$ such that $T_\mathcal{E}$ is isomorphic to the target tree $T$.

Figures (8)

  • Figure 1: Overview of the SatDiff Algorithm.
  • Figure 2: High-level Operations.
  • Figure 3: Low-level edit actions execute edge and node operations described in the edit effect.
  • Figure 4: Dependency graph of low-level edit actions. The dashed regions contain a group of low-level edit actions that can merged to high-level edit actions.
  • Figure 5: Absolute differences and relative differences in edit script sizes between different baselines and $\textsc{SatDiff}$. In Fig (a) we calculate the difference between the edit script size of the baseline approaches and SatDiff on the Python dataset while in Fig (b) we calculate their ratio. In Fig (c) we present both the difference and ratio on the OCaml dataset.
  • ...and 3 more figures

Theorems & Definitions (34)

  • Definition 4.1: Virtual nodes and the source space
  • Remark 1: Optimization on virtual nodes
  • Definition 4.2: Edit effect
  • Remark 2
  • Lemma 1
  • Definition 4.3: The edge variable
  • Remark 3
  • Definition 4.4: The match variable
  • Remark 4
  • Definition 4.5: Potential edges
  • ...and 24 more