Discrete Graph Auto-Encoder

Yoann Boget; Magda Gregorova; Alexandros Kalousis

Discrete Graph Auto-Encoder

Yoann Boget, Magda Gregorova, Alexandros Kalousis

TL;DR

This paper addresses graph generation when no canonical node ordering exists by introducing Discrete Graph Auto-Encoder (DGAE), which first maps graphs to sets of discrete node embeddings via a permutation-equivariant encoder and then models their distribution by sorting into sequences and applying a 2D autoregressive Transformer. The two-stage framework leverages feature augmentation with $p$-path features and partitioned vector quantization to create a discrete latent space with known support, enabling efficient learning of the latent distribution. Experiments on simple graphs and molecular datasets show state-of-the-art-like performance on distributional metrics (e.g., NSPDK, FCD) and substantial generation speed gains over baselines, with ablations validating the benefits of the proposed augmentations and codebook design. The work contributes a novel combination of graph-to-set encoding, discrete latent modeling, and a two-dimensional Transformer for graph generation, offering a scalable and effective path for generic graph synthesis beyond domain-specific representations.

Abstract

Despite advances in generative methods, accurately modeling the distribution of graphs remains a challenging task primarily because of the absence of predefined or inherent unique graph representation. Two main strategies have emerged to tackle this issue: 1) restricting the number of possible representations by sorting the nodes, or 2) using permutation-invariant/equivariant functions, specifically Graph Neural Networks (GNNs). In this paper, we introduce a new framework named Discrete Graph Auto-Encoder (DGAE), which leverages the strengths of both strategies and mitigate their respective limitations. In essence, we propose a strategy in 2 steps. We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations, each node being represented by a sequence of quantized vectors. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model based on the Transformer architecture. Through multiple experimental evaluations, we demonstrate the competitive performances of our model in comparison to the existing state-of-the-art across various datasets. Various ablation studies support the interest of our method.

Discrete Graph Auto-Encoder

TL;DR

-path features and partitioned vector quantization to create a discrete latent space with known support, enabling efficient learning of the latent distribution. Experiments on simple graphs and molecular datasets show state-of-the-art-like performance on distributional metrics (e.g., NSPDK, FCD) and substantial generation speed gains over baselines, with ablations validating the benefits of the proposed augmentations and codebook design. The work contributes a novel combination of graph-to-set encoding, discrete latent modeling, and a two-dimensional Transformer for graph generation, offering a scalable and effective path for generic graph synthesis beyond domain-specific representations.

Abstract

Paper Structure (70 sections, 18 equations, 13 figures, 16 tables)

This paper contains 70 sections, 18 equations, 13 figures, 16 tables.

Introduction
Background
Notation
Graph isomorphism
Generative models and graph isomorphism
Sequentialization
Invariance to permutation
Graph neural networks
Message Passing Neural Networks
Message Passing Layer
Limitations
Related work
Sequential generation
Models invariant to permutations
Graph Auto-Encoders
...and 55 more sections

Figures (13)

Figure 1: Diagram of our auto-encoder. 1. The encoder is an MPNN transforming the graph into a set of node embeddings $\mathcal{Z}^h$. 2. The elements of the set $\mathcal{Z}^h$ are partitioned and quantized, producing a set of codeword sequences $\mathcal{Z}^q$. 3. The decoder, an other MPNN, takes the set $\mathcal{Z}^q$ and reconstruct the original graph.
Figure 2: Diagram of the quantization. We represent each node embedding by $C$ partition vectors ${\bm{z}}^h_{i, c}$. Then, we quantize each of these vectors by replacing them with their closest neighbor from the corresponding codebook $H_c$. The vectors in the codebooks are parameters learned during training.
Figure 3: The lines represent the average over three runs and the shaded area the standard deviation.
Figure 4: Effect of the codebook size and the partitioning on the dictionary usage. We report the normalized perplexity averaged over three runs. The black lines indicate the standard deviations.
Figure 5: Effect of the codebook size and the partitioning on reconstruction (left) and generation (right). We report the best reconstruction loss and the best NSPDK averaged over 3 runs. The black lines indicate the standard deviations.
...and 8 more figures

Discrete Graph Auto-Encoder

TL;DR

Abstract

Discrete Graph Auto-Encoder

Authors

TL;DR

Abstract

Table of Contents

Figures (13)