DeFoG: Discrete Flow Matching for Graph Generation
Yiming Qin, Manuel Madeira, Dorina Thanou, Pascal Frossard
TL;DR
DeFoG introduces Discrete Flow Matching for Graph Generation, a framework that decouples training and sampling in graph diffusion, enabling flexible, efficient sampling via a continuous-time Markov chain denoising process. By predicting marginal clean-node/edge distributions with a permutation-equivariant neural network and using a decoupled rate-matrix design at sampling time, it achieves state-of-the-art performance on synthetic and molecular graphs with only a fraction of the sampling steps required by diffusion models. The authors provide theoretical guarantees linking training objectives to sampling dynamics, and they thoroughly explore a rich design space—time distortions, target guidance, and stochastic priors—supported by extensive ablations and conditional generation experiments. Overall, DeFoG offers a scalable, flexible approach for high-quality graph generation with provable grounding and practical efficiency gains.
Abstract
Graph generative models are essential across diverse scientific domains by capturing complex distributions over relational data. Among them, graph diffusion models achieve superior performance but face inefficient sampling and limited flexibility due to the tight coupling between training and sampling stages. We introduce DeFoG, a novel graph generative framework that disentangles sampling from training, enabling a broader design space for more effective and efficient model optimization. DeFoG employs a discrete flow-matching formulation that respects the inherent symmetries of graphs. We theoretically ground this disentangled formulation by explicitly relating the training loss to the sampling algorithm and showing that DeFoG faithfully replicates the ground truth graph distribution. Building on these foundations, we thoroughly investigate DeFoG's design space and propose novel sampling methods that significantly enhance performance and reduce the required number of refinement steps. Extensive experiments demonstrate state-of-the-art performance across synthetic, molecular, and digital pathology datasets, covering both unconditional and conditional generation settings. It also outperforms most diffusion-based models with just 5-10% of their sampling steps.
