Table of Contents
Fetching ...

Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

Lingxiao Zhao, Xueying Ding, Leman Akoglu

TL;DR

Pard addresses the challenge of permutation-invariant graph generation by combining autoregressive blockwise generation with diffusion-based denoising. Each block's conditional distribution is modeled by a shared discrete diffusion implemented with an equivariant network, yielding an exchangeable joint distribution across graphs. The framework is enhanced by a higher-order transformer for memory-efficient expressivity and a causal transformer-based parallel training scheme that scales to large datasets. Empirically, Pard achieves state-of-the-art or highly competitive results on molecular and generic graph benchmarks, and demonstrates scalability to MOSES-style datasets, positioning it as a strong candidate for a graph generative foundation model.

Abstract

Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules. Pard is open-sourced at https://github.com/LingxiaoShawn/Pard.

Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

TL;DR

Pard addresses the challenge of permutation-invariant graph generation by combining autoregressive blockwise generation with diffusion-based denoising. Each block's conditional distribution is modeled by a shared discrete diffusion implemented with an equivariant network, yielding an exchangeable joint distribution across graphs. The framework is enhanced by a higher-order transformer for memory-efficient expressivity and a causal transformer-based parallel training scheme that scales to large datasets. Empirically, Pard achieves state-of-the-art or highly competitive results on molecular and generic graph benchmarks, and demonstrates scalability to MOSES-style datasets, positioning it as a strong candidate for a graph generative foundation model.

Abstract

Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules. Pard is open-sourced at https://github.com/LingxiaoShawn/Pard.
Paper Structure (30 sections, 29 equations, 4 figures, 10 tables, 3 algorithms)

This paper contains 30 sections, 29 equations, 4 figures, 10 tables, 3 algorithms.

Figures (4)

  • Figure 1: The Architecture of the PPGN-Transformer Block. In (b) and (c) we provide illustrations of how edge and node features are processed through Transformer and PPGN blocks.
  • Figure 2: Non curated QM9 (with explicit hydrogens) graphs generated from the Pard trained with 20 steps per block.
  • Figure 3: Non curated grid graphs generated from the Pard trained with 50 steps per block.
  • Figure 4: Non curated grid graphs generated from the Pard (with eigenvector) trained with 50 steps per block.

Theorems & Definitions (3)

  • proof
  • Definition A.1: Graph Transformation
  • Definition A.2: Equivariant Graph Transformation