Sparse Training of Discrete Diffusion Models for Graph Generation
Yiming Qin, Clement Vignac, Pascal Frossard
TL;DR
This work tackles the quadratic bottleneck in diffusion-based graph generation by leveraging edge-list representations and sparsity. It introduces SparseDiff, a discrete diffusion framework with a sparsity-preserving noise model, a loss on a subset of edges, and a sparse graph transformer trained on random query edges; sampling proceeds iteratively to fill the adjacency matrix. Empirical results show state-of-the-art or competitive performance on small molecules (QM9, Moses) and large graphs (SBM, Planar, Ego, Protein), with notable training efficiency on large graphs. The approach broadens the practical scalability of discrete graph diffusion, enabling generation of significantly larger and more complex graphs than prior dense methods.
Abstract
Generative graph models struggle to scale due to the need to predict the existence or type of edges between all node pairs. To address the resulting quadratic complexity, existing scalable models often impose restrictive assumptions such as a cluster structure within graphs, thus limiting their applicability. To address this, we introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network, which ensures that space complexity scales linearly with the number of chosen edges. During inference, SparseDiff progressively fills the adjacency matrix with the selected subsets of edges, mirroring the training process. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets, confirming its effectiveness and robustness across varying graph sizes. It also ensures faster convergence, particularly on larger graphs, achieving a fourfold speedup on the large Ego dataset compared to dense models, thereby paving the way for broader applications.
