Table of Contents
Fetching ...

Learning Tree Pattern Transformations

Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, Thomas Zeume

TL;DR

The paper tackles the problem of learning concise explanations for structural differences between labelled, ordered trees by representing explanations as a small set of tree pattern transformations. It introduces a pattern-based language with injective matching and body/head patterns to capture local rearrangements and substitutions, and formalizes the LearningTreeTransformations problem. It establishes strong hardness results (NP-hard, and NP-complete for restricted cases) via reductions from VertexCover and 3-SAT, and offers a practical SAT-based encoding to solve real-world instances, including CS-education data. It also discusses extending the language with interval variables to model tree edits, while highlighting the trade-offs in expressivity and tractability. Overall, the work provides a principled computational framework for extracting high-level structural explanations for tree-structured data with potential educational impact.

Abstract

Explaining why and how a tree $t$ structurally differs from another tree $t^\star$ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set $\{(t_1, t_1^\star),\dots, (t_n, t_n^\star)\}$ of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs $(t_i, t_i^\star)$? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Learning Tree Pattern Transformations

TL;DR

The paper tackles the problem of learning concise explanations for structural differences between labelled, ordered trees by representing explanations as a small set of tree pattern transformations. It introduces a pattern-based language with injective matching and body/head patterns to capture local rearrangements and substitutions, and formalizes the LearningTreeTransformations problem. It establishes strong hardness results (NP-hard, and NP-complete for restricted cases) via reductions from VertexCover and 3-SAT, and offers a practical SAT-based encoding to solve real-world instances, including CS-education data. It also discusses extending the language with interval variables to model tree edits, while highlighting the trade-offs in expressivity and tractability. Overall, the work provides a principled computational framework for extracting high-level structural explanations for tree-structured data with potential educational impact.

Abstract

Explaining why and how a tree structurally differs from another tree is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs ? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Paper Structure

This paper contains 21 sections, 9 theorems, 37 equations, 1 figure, 1 table, 3 algorithms.

Key Result

proposition 1

Given a tree pattern transformation $\rho\colon \sigma \rightsquigarrow \sigma^\star$ and a pair $(t, t^\star)$ of trees, it can be tested in polynomial time whether $\rho$ can transform $t$ into $t^\star$, i.e. whether $t \rightsquigarrow_\rho t^\star$.

Figures (1)

  • Figure 1: (a) Two pairs of syntax trees for the pairs $(E \to (\lnot A \land \lnot C), (\lnot A \land \lnot C) \to E)$ and $((B \to D) \wedge A, (D \to B) \wedge A)$ of propositional formulas. (b) A tree pattern transformation explaining the structural differences of both pairs. The transformation selects a $\rightarrow$-labelled node and swaps its subtrees $Y_1$ and $Y_2$.

Theorems & Definitions (17)

  • proposition 1
  • proof : Proof sketch.
  • theorem 1
  • proposition 2
  • proof : Proof sketch.
  • proposition 3
  • proof : Proof sketch.
  • proposition 4
  • proof : Proof sketch.
  • theorem 2
  • ...and 7 more