Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit Distance
Mamoona Ghafoor, Tatsuya Akutsu
TL;DR
This work establishes, for rooted, ordered, vertex-labeled trees, that ReLU-based generative networks can deterministically enumerate all trees within a given tree edit distance $d$ from a fixed template tree $T$ by encoding trees as Euler strings and reducing edit operations to string edits. It presents explicit network constructions with proven size-depth guarantees: TS$_d$ with $O(dn^2)$, TD$_d$ with $O(n^2)$, TI$_d$ with $O(n^3)$, and TE$_d$ with $O(n^3)$, all at constant depth, culminating in a TE$_d$-generative model that handles substitutions, deletions, and insertions simultaneously. Theoretical results are complemented by computational experiments up to 21 nodes, showing complete enumeration and deterministic generation, while comparisons with GraphRNN and GraphGDP highlight the advantages of exact, structure-preserving generation for tree-structured data. The findings provide a foundational, compact, and exact framework for deterministic tree generation with potential scalability considerations and avenues for optimization in width-handling and practical deployment.
Abstract
The generation of trees with a specified tree edit distance has significant applications across various fields, including computational biology, structured data analysis, and image processing. Recently, generative networks have been increasingly employed to synthesize new data that closely resembles the original datasets. However, the appropriate size and depth of generative networks required to generate data with a specified tree edit distance remain unclear. In this paper, we theoretically establish the existence and construction of generative networks capable of producing trees similar to a given tree with respect to the tree edit distance. Specifically, for a given rooted, ordered, and vertex-labeled tree T of size n + 1 with labels from an alphabet Σ, and a non-negative integer d, we prove that all rooted, ordered, and vertex-labeled trees over Σwith tree edit distance at most d from T can be generated using a ReLU-based generative network with size O(n^3 ) and constant depth. The proposed networks were implemented and evaluated for generating trees with up to 21 nodes. Due to their deterministic architecture, the networks successfully generated all valid trees within the specified tree edit distance. In contrast, state-of-the-art graph generative models GraphRNN and GraphGDP, which rely on non-deterministic mechanisms, produced significantly fewer valid trees, achieving validation rates of only up to 35% and 48%, respectively. These findings provide a theoretical foundation towards construction of compact generative models and open new directions for exact and valid tree-structured data generation. An implementation of the proposed networks is available at https://github.com/MGANN-KU/TreeGen_ReLUNetworks.
