Directed Graph Grammars for Sequence-based Learning
Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen
TL;DR
Directed Graph Grammars for Sequence-based Learning (DIGGED) introduces a principled, lossless mapping from DAGs to sequences using an edNCE graph grammar with linear parse trees. By unsupervised grammar induction driven by Minimum Description Length (MDL) and a disambiguation procedure, it yields a one-to-one, onto, deterministic, and valid graph-to-sequence representation suitable for generation, prediction, and Bayesian optimization. The approach integrates with a graph encoder or a rule-sequence Transformer within an autoencoder, and demonstrates strong performance on neural architectures, Bayesian networks, and circuit design tasks, including real-world case studies. This work advances graph generative modeling by enabling compositional, interpretable sequence-based representations of graphs that leverage modern sequence models.
Abstract
Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.
