Table of Contents
Fetching ...

Directed Graph Grammars for Sequence-based Learning

Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

TL;DR

Directed Graph Grammars for Sequence-based Learning (DIGGED) introduces a principled, lossless mapping from DAGs to sequences using an edNCE graph grammar with linear parse trees. By unsupervised grammar induction driven by Minimum Description Length (MDL) and a disambiguation procedure, it yields a one-to-one, onto, deterministic, and valid graph-to-sequence representation suitable for generation, prediction, and Bayesian optimization. The approach integrates with a graph encoder or a rule-sequence Transformer within an autoencoder, and demonstrates strong performance on neural architectures, Bayesian networks, and circuit design tasks, including real-world case studies. This work advances graph generative modeling by enabling compositional, interpretable sequence-based representations of graphs that leverage modern sequence models.

Abstract

Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.

Directed Graph Grammars for Sequence-based Learning

TL;DR

Directed Graph Grammars for Sequence-based Learning (DIGGED) introduces a principled, lossless mapping from DAGs to sequences using an edNCE graph grammar with linear parse trees. By unsupervised grammar induction driven by Minimum Description Length (MDL) and a disambiguation procedure, it yields a one-to-one, onto, deterministic, and valid graph-to-sequence representation suitable for generation, prediction, and Bayesian optimization. The approach integrates with a graph encoder or a rule-sequence Transformer within an autoencoder, and demonstrates strong performance on neural architectures, Bayesian networks, and circuit design tasks, including real-world case studies. This work advances graph generative modeling by enabling compositional, interpretable sequence-based representations of graphs that leverage modern sequence models.

Abstract

Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction.

Paper Structure

This paper contains 52 sections, 1 equation, 9 figures, 17 tables.

Figures (9)

  • Figure 1: We adopt the edNCE grammar formalism. (Top): Dataset $\mathcal{D}=\{H_1,H_2,H_3\}$; (Middle): Step 1 (Sec 3.2.1). Our approximate frequent subgraph mining library finds candidate subgraphs. As an example, the induced subgraph from nodes 1 & 2 in all three DAGs is considered. Its occurrences in $H_1, H_2, H_3$ are grounded. Step 2 (Sec 3.2.2). For each possible assignment of gray edge directions, bounds on the set of instructions are deduced. For example, the subgraph occurrence in $H_1$includes into $I$, “for each green in-neighbor (gray), add out-edge (black) from node 2”, and excludes from $I$, "for each green in-neighbor, add out-edge (black) from node 1". $H_2$ includes into $I$: “for each greenout-neighbor, add out-edges from both nodes 1 and 2”. Suppose we had reversed the gray arrow in $H_1$. Then, the exclusion set of case $H_1$ conflicts with the inclusion set of $H_2$, since it's unclear if we should add out-edges from both 1 & 2 to each greenout-neighbor, or just node 2. Intuitively, cases that differ in the precondition of edge direction are labeled with separate letters (e.g. a vs b), inducing different but non-conflicting instructions. Step 3 (Sec 3.2.2). Given bounds on the instruction set for each motif occurrence, the final set of instructions is deduced from the (approximate) solution of a max clique problem. Each node is a (motif occurrence, edge redirections) realization. Each edge indicates compatibility. Step 4 (Sec 3.2.3). The candidate motif and the associated solution to Step 3 which minimizes the total data description length is chosen to define a grammar rule. Then, Steps 1-4 are repeated until convergence. (Bottom): A grammar rule consists of a subgraph (gray) and instructions to connect it to the neighborhood. Instructions are grouped by letters, identifying the node label and its directional relationship to the parent gray node.
  • Figure 2: (Top) Our grammar induction framework iteratively minimizes the total description length of $\mathcal{D}$, contracting common and compatible motifs, producing grammar rules while parsing the input according to the grammar. (Bottom-left) Our induction algorithm builds the token dictionary, where individual rules are the tokens used in a faithful sequential representation of the DAG. (Bottom-right) We experiment with two ways to encode the DAG: 1) using a full attention Transformer encoder vs 2) using a GNN tailored to DAGs thost2021directed; in both cases, we use causal, autoregressive Transformer decoder within an autoencoder framework, while jointly learning the embedding dictionary.
  • Figure 3: We visualize the best discovered designs from BO. We reproduce the same BO and evaluation setup as zhang2019dpham2018efficientdong2023cktgnn.
  • Figure 4: We show $M:=|H|$ as a function of iteration (same as the number of rules induced). Axes are scaled to 1.0 for standardization across datasets. The lower legend follows the format initial $|H| \rightarrow$ pre-termination $|H| \rightarrow$ post-termination $|H|$ (=$|\mathcal{D}|$).
  • Figure 5: We stratify the test error distribution across the parse length. For reference, we also include a count of the number of test set examples of each parse length.
  • ...and 4 more figures