Sparse Training of Discrete Diffusion Models for Graph Generation

Yiming Qin; Clement Vignac; Pascal Frossard

Sparse Training of Discrete Diffusion Models for Graph Generation

Yiming Qin, Clement Vignac, Pascal Frossard

TL;DR

This work tackles the quadratic bottleneck in diffusion-based graph generation by leveraging edge-list representations and sparsity. It introduces SparseDiff, a discrete diffusion framework with a sparsity-preserving noise model, a loss on a subset of edges, and a sparse graph transformer trained on random query edges; sampling proceeds iteratively to fill the adjacency matrix. Empirical results show state-of-the-art or competitive performance on small molecules (QM9, Moses) and large graphs (SBM, Planar, Ego, Protein), with notable training efficiency on large graphs. The approach broadens the practical scalability of discrete graph diffusion, enabling generation of significantly larger and more complex graphs than prior dense methods.

Abstract

Generative graph models struggle to scale due to the need to predict the existence or type of edges between all node pairs. To address the resulting quadratic complexity, existing scalable models often impose restrictive assumptions such as a cluster structure within graphs, thus limiting their applicability. To address this, we introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network, which ensures that space complexity scales linearly with the number of chosen edges. During inference, SparseDiff progressively fills the adjacency matrix with the selected subsets of edges, mirroring the training process. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets, confirming its effectiveness and robustness across varying graph sizes. It also ensures faster convergence, particularly on larger graphs, achieving a fourfold speedup on the large Ego dataset compared to dense models, thereby paving the way for broader applications.

Sparse Training of Discrete Diffusion Models for Graph Generation

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 7 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 7 equations, 10 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Denoising diffusion models for graphs
Scalable Graph Generation
Subgraph Aggregation
Hierarchical Refinement
SparseDiff: Sparse Denoising Diffusion for Large Graph Generation
Sparsity-preserving noise model
Prediction on a subset of pairs
Sparse Message-Passing Transformer
A first approach: graph learning as a link prediction problem
Second approach: learning representations for edges
Architecture
Sampling
Experiments
...and 24 more sections

Key Result

Lemma 3.1

(High-probability bound on the sparsity of the noisy graph) Consider a graph with $n$ nodes and $m$ edges. We denote by $k$ the edge ratio $m / (n (n-1) / 2)$. Let $m_t$ denote the number of edges in the noisy graph $G^t$ sampled from the marginal transition model. Then, for $n$ sufficiently large a

Figures (10)

Figure 1: Samples from SparseDiff trained on large graphs.
Figure 2: Overview of SparseDiff. In order to train a denoising neural network without considering all pairs of nodes, SparseDiff combines i) a noise model that preserves sparsity during diffusion; ii) a graph transformer $\phi_\theta$ implemented within the message-passing framework; iii) a loss function computed on a subset ${\bm{E}}_q$ of all pairs of nodes. Together, these components allow for using edge lists and training diffusion models on significantly larger graphs than dense methods.
Figure 3: Definition of the noisy graph $G^t$, the query graph $G_q$, and the computational graph $G_c$, with an edge proportion $\lambda=0.16$. The noisy graph $G^t$ is the result of our sparsity-preserving noising process, the query graph $G_{q}$ consists of a fraction $\lambda$ of randomly chosen edges, and the computational graph $G_{c}$ is the union of the noisy and query graphs. Self-loops are not included in the calculation.
Figure 4: Visualization of the iterative sampling process, with a query edge proportion
Figure 5: Visualization for Moses dataset.
...and 5 more figures

Theorems & Definitions (2)

Lemma 3.1
Proposition B.1

Sparse Training of Discrete Diffusion Models for Graph Generation

TL;DR

Abstract

Sparse Training of Discrete Diffusion Models for Graph Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)