Are Expressive Encoders Necessary for Discrete Graph Generation?

Jay Revolinsky; Harry Shomer; Jiliang Tang

Are Expressive Encoders Necessary for Discrete Graph Generation?

Jay Revolinsky, Harry Shomer, Jiliang Tang

TL;DR

A systematic ablation study shows the benefit provided by each GenGNN component, indicating the need for residual connections to mitigate oversmoothing on complicated graph-structure, and investigates learned diffusion representations to uncover whether GNNs can be expressive neural backbones for discrete diffusion.

Abstract

Discrete graph generation has emerged as a powerful paradigm for modeling graph data, often relying on highly expressive neural backbones such as transformers or higher-order architectures. We revisit this design choice by introducing GenGNN, a modular message-passing framework for graph generation. Diffusion models with GenGNN achieve more than 90% validity on Tree and Planar datasets, within margins of graph transformers, at 2-5x faster inference speed. For molecule generation, DiGress with a GenGNN backbone achieves 99.49% Validity. A systematic ablation study shows the benefit provided by each GenGNN component, indicating the need for residual connections to mitigate oversmoothing on complicated graph-structure. Through scaling analyses, we apply a principled metric-space view to investigate learned diffusion representations and uncover whether GNNs can be expressive neural backbones for discrete diffusion.

Are Expressive Encoders Necessary for Discrete Graph Generation?

TL;DR

Abstract

Paper Structure (34 sections, 9 theorems, 35 equations, 15 figures, 10 tables)

This paper contains 34 sections, 9 theorems, 35 equations, 15 figures, 10 tables.

Introduction
Background
Preliminary
Discrete Graph Diffusion
Message-Passing in Graph Generative Models
Oversmoothing in GNNs
Evaluation
GenGNN
Model Design
Edge and Node Gating.
Feed-forward networks (FFN).
Residual connections
Layer normalization.
Unified update form.
Pooling operators.
...and 19 more sections

Key Result

theorem 1

Given Eqn. eq:unit_distance and Assumptions ass:rrwp_nondeg1--ass:backbone_bound1, for all diffusion steps $t$ and the dominant eigenvector $v$. Where $\mu_{v}$ is the margin estimating node-signal collapse and $X_{\mathrm{out}}^{(t)}$ is the node-wise denoised output, then: In particular, if $\gamma>2C$ then the denoiser outputs cannot collapse to $\mathrm{span}\{v\}$ at any reverse diffusion st

Figures (15)

Figure 1: Planar graphs generated via DeFoG with a simple GNN and Graph Transformer backbone. We can see that the GNN fails to properly sample planar structure, instead producing several clustered communities.
Figure 2: The per-layer GenGNN framework, composed of modular components, in order: Node (X), Edge (y) Features w/ RRWP (shown in Orange), Edge Gating (EG), GNN/GINe/GCN layer, Node Gating (NG), Feed-Forward Networks (FFN), Residuals+Normalization (RN). Note: The blue and orange (RRWP) modules are ablateable. Yellow and green modules are always enabled during experimentation.
Figure 3: The top-5 relative inference speedups for GenGNN vs. PPGN and GT denoising backbones across permutations of tested datasets. Individual colors represent GT (blue) and PPGN (white), hatching corresponds to a given dataset.
Figure 4: The change in MMD and V.U.N. across individual ablated components of the GenGNN framework (log-scaled), with a simple GNN backbone (on right).
Figure 5: The change in Validity (top-left), Accuracy (bottom-left), MagDiff (top-center), and Accuracy (bottom-center) from layer depths 1 to 24 for the fully-enabled GenGNN and GT frameworks vs. GenGNN with RRWP and residual-normalization ablated, averaged over five runs. (top-right) The trade-off between MagDiff and Validity for the QM9 Dataset. (bottom-right) The trade-off between Average MMD Ratio and Accuracy for the Tree Dataset.
...and 10 more figures

Theorems & Definitions (11)

definition 1: Node-wise Structural Dispersion
theorem 1: Uniform non-collapse of residual-anchored graph denoisers
corollary 1: Anchored denoising robustness
lemma 1: Equivalence of oversmoothing measures scholkemper2024residual
definition 2: Node-wise Structural Dispersion
lemma 2: Two-term lower bound in an orthogonal subspace
lemma 3: Positional Encoding anchor lower bounds residual
lemma 4: Outer Residual induces a deterministic non-collapse certificate
proposition 1: Per-step non-collapse bound for the denoiser
theorem 2: Uniform non-collapse of residual-anchored graph denoisers
...and 1 more

Are Expressive Encoders Necessary for Discrete Graph Generation?

TL;DR

Abstract

Are Expressive Encoders Necessary for Discrete Graph Generation?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (11)