Table of Contents
Fetching ...

Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit Distance

Mamoona Ghafoor, Tatsuya Akutsu

TL;DR

This work establishes, for rooted, ordered, vertex-labeled trees, that ReLU-based generative networks can deterministically enumerate all trees within a given tree edit distance $d$ from a fixed template tree $T$ by encoding trees as Euler strings and reducing edit operations to string edits. It presents explicit network constructions with proven size-depth guarantees: TS$_d$ with $O(dn^2)$, TD$_d$ with $O(n^2)$, TI$_d$ with $O(n^3)$, and TE$_d$ with $O(n^3)$, all at constant depth, culminating in a TE$_d$-generative model that handles substitutions, deletions, and insertions simultaneously. Theoretical results are complemented by computational experiments up to 21 nodes, showing complete enumeration and deterministic generation, while comparisons with GraphRNN and GraphGDP highlight the advantages of exact, structure-preserving generation for tree-structured data. The findings provide a foundational, compact, and exact framework for deterministic tree generation with potential scalability considerations and avenues for optimization in width-handling and practical deployment.

Abstract

The generation of trees with a specified tree edit distance has significant applications across various fields, including computational biology, structured data analysis, and image processing. Recently, generative networks have been increasingly employed to synthesize new data that closely resembles the original datasets. However, the appropriate size and depth of generative networks required to generate data with a specified tree edit distance remain unclear. In this paper, we theoretically establish the existence and construction of generative networks capable of producing trees similar to a given tree with respect to the tree edit distance. Specifically, for a given rooted, ordered, and vertex-labeled tree T of size n + 1 with labels from an alphabet Σ, and a non-negative integer d, we prove that all rooted, ordered, and vertex-labeled trees over Σwith tree edit distance at most d from T can be generated using a ReLU-based generative network with size O(n^3 ) and constant depth. The proposed networks were implemented and evaluated for generating trees with up to 21 nodes. Due to their deterministic architecture, the networks successfully generated all valid trees within the specified tree edit distance. In contrast, state-of-the-art graph generative models GraphRNN and GraphGDP, which rely on non-deterministic mechanisms, produced significantly fewer valid trees, achieving validation rates of only up to 35% and 48%, respectively. These findings provide a theoretical foundation towards construction of compact generative models and open new directions for exact and valid tree-structured data generation. An implementation of the proposed networks is available at https://github.com/MGANN-KU/TreeGen_ReLUNetworks.

Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit Distance

TL;DR

This work establishes, for rooted, ordered, vertex-labeled trees, that ReLU-based generative networks can deterministically enumerate all trees within a given tree edit distance from a fixed template tree by encoding trees as Euler strings and reducing edit operations to string edits. It presents explicit network constructions with proven size-depth guarantees: TS with , TD with , TI with , and TE with , all at constant depth, culminating in a TE-generative model that handles substitutions, deletions, and insertions simultaneously. Theoretical results are complemented by computational experiments up to 21 nodes, showing complete enumeration and deterministic generation, while comparisons with GraphRNN and GraphGDP highlight the advantages of exact, structure-preserving generation for tree-structured data. The findings provide a foundational, compact, and exact framework for deterministic tree generation with potential scalability considerations and avenues for optimization in width-handling and practical deployment.

Abstract

The generation of trees with a specified tree edit distance has significant applications across various fields, including computational biology, structured data analysis, and image processing. Recently, generative networks have been increasingly employed to synthesize new data that closely resembles the original datasets. However, the appropriate size and depth of generative networks required to generate data with a specified tree edit distance remain unclear. In this paper, we theoretically establish the existence and construction of generative networks capable of producing trees similar to a given tree with respect to the tree edit distance. Specifically, for a given rooted, ordered, and vertex-labeled tree T of size n + 1 with labels from an alphabet Σ, and a non-negative integer d, we prove that all rooted, ordered, and vertex-labeled trees over Σwith tree edit distance at most d from T can be generated using a ReLU-based generative network with size O(n^3 ) and constant depth. The proposed networks were implemented and evaluated for generating trees with up to 21 nodes. Due to their deterministic architecture, the networks successfully generated all valid trees within the specified tree edit distance. In contrast, state-of-the-art graph generative models GraphRNN and GraphGDP, which rely on non-deterministic mechanisms, produced significantly fewer valid trees, achieving validation rates of only up to 35% and 48%, respectively. These findings provide a theoretical foundation towards construction of compact generative models and open new directions for exact and valid tree-structured data generation. An implementation of the proposed networks is available at https://github.com/MGANN-KU/TreeGen_ReLUNetworks.

Paper Structure

This paper contains 9 sections, 7 theorems, 26 equations, 14 figures, 7 tables.

Key Result

Lemma 1

Let $T$ be a tree with $n$ edges, and $x = x_1, x_2, \ldots, x_{d}$ be a random DFS sequence of integers over the interval $[0, n]$. Then there exists a ReLU network with size $\mathcal{O}(dn)$ and constant depth that can identify the label of inward edge of the vertex with non-zero DFS index $x_j$

Figures (14)

  • Figure 1: Tree deletion and insertion operations. In $T$, the node with label 7 is the parent of the node with label 5, which has been deleted. The node with label 7 becomes the new parent of the children with labels 1 and 6 of node 5 in $U$. Similarly, a node with label 5 is inserted as a child of the node with label 7 in $U$, and the nodes with labels 1 and 6 are set as children of node 5 in $T$. The order among the children 1 and 6 is preserved in the deletion and insertion operations.
  • Figure 2: (a) A vertex-labeled, rooted and ordered tree $T$ with six vertices, root $r$, label set $\Sigma = \{1, 2,3, 4, 5\}$ and a random sequence $x_1, x_2, x_3 = 1, 3, 0$, where the labels are depicted inside the vertices, and the DFS indices are shown in green; (b) The directed tree corresponding to $T$ given in (a). The inward edges and outward edges are depicted by solid and dashed directed lines, respectively. The labels and DFS indices of these edges are shown in black and brown color, respectively. The vertex $u$ with label $3$ is the parent of $v$ with label $2$. Corresponding to the edge $uv$ in $T$, there is an inward edge $(u, v)$ and an outward edge $(v, u)$ with labels $2$ and $7$, respectively, in the directed tree. These edges $(u, v)$ and $(v, u)$ are the inward and outward, resp., edges of $x_2 = 3$ since the DFS index of $v$ is 3 in $T$. Note that there is no edge that corresponds to $x_3=0$; and (c) The Euler string $E(T)$.
  • Figure 3: Illustrations of the variables used Eqs.(\ref{['eqp3']})-(\ref{["eqr'3"]}) in Lemma \ref{['thm:inward']}: (a) The variable $p_i$ which is 1 for the inward (black) edges and 0 for the outward (gray) edges in the directed tree corresponding to the tree $T$ given in Fig. \ref{['fig:ET']}(a), e.g., $p_2 = 1$ (resp., $p_9 = 0$) as there is an inward edge (resp., outward edge) with the DFS index 2 (resp., 9); (b) The variable $p'_i$ (green), e.g., $p'_8 = 5$ means that the inward edge of $5$ has the DFS index 8 in (a); (c) For a fixed DFS index $i$, the variable $q_{ji} =1$ for some $j$ (black inward edge), and $q_{ji} =0$ for all $j$ (gray inward edges), e.g., for the DFS index $i =4$, we have $j = 2$ such that $q_{2,4} = 1$ and thus the inward edge with the DFS index $4$ is depicted in black, whereas for $i = 8$ there does not exist any $j$ such that $q_{j8} = 1$, and so the inward edge with the DFS index $8$ is depicted in gray. The dark edges are the desired inward edges of which labels are required. The labels of these edges are stored by the variable $r'_{ji}$, e.g., $r'_{2,4} = 2$ means that the desired inward edge specified by $x_2$ has the DFS index $4$ and label $2$.
  • Figure 4: An illustration of the number of inward edges (green) (resp., outward edges (blue)) for the edges $t_i, i= 7, 9$ and $t_{\ell}, \ell = 2, 3, 5$ which are depicted in gray boxes.
  • Figure 5: An illustration of the variables used in Eqs. (\ref{['eqr3']})-(\ref{["eqz'3"]}) of Lemma \ref{['thm:ebar4']} to identify the positions and labels of the desired outward edges.
  • ...and 9 more figures

Theorems & Definitions (21)

  • Lemma 1
  • proof
  • Example 1
  • Proposition 1
  • proof
  • Example 2
  • Lemma 2
  • proof
  • Example 3
  • Theorem 1
  • ...and 11 more