Large Generative Graph Models
Yu Wang, Ryan A. Rossi, Namyong Park, Huiyuan Chen, Nesreen K. Ahmed, Puja Trivedi, Franck Dernoncourt, Danai Koutra, Tyler Derr
TL;DR
This work introduces Large Graph Generative Models (LGGMs), the first framework to pre-train graph generators on a large, multi-domain corpus (over $5000$ graphs from $13$ domains) to learn transferable graph priors. Leveraging discrete denoising diffusion with forward transitions and a text-conditioned objective, LGGMs achieve strong zero-shot generalization and robust fine-tuning, outperforming prior single-domain graph models such as DiGress in most settings. A key novelty is Text-to-Graph generation, where prompts describing graph domains/names or statistics guide graph synthesis through a neural conditioner, enabling fine-grained control over properties like average degree and clustering coefficient. The results demonstrate practical benefits for cross-domain graph generation and customization, and the work releases code, model checkpoints, and datasets to foster community development and downstream applications.
Abstract
Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains. We empirically demonstrate that the pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGM with the capability to generate graphs given text prompts (Text-to-Graph), such as the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system."), and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://lggm-lg.github.io/.
