Joint Relational Database Generation via Graph-Conditional Diffusion Models
Mohamed Amine Ketata, David Lüdke, Leo Schwinn, Stephan Günnemann
TL;DR
We address synthetic relational database generation by moving beyond autoregressive table-ordering and toward a joint, graph-based framework. The paper introduces Graph-Conditional Relational Diffusion Model (GRDM), which first samples a structure-preserving graph ${\mathcal{G}}=(\mathcal{V},\mathcal{E},\mathcal{X})$ to represent ${\mathcal{R}}$, and then jointly denoises node attributes with a diffusion model conditioned on a local $K$-hop neighborhood, modeling $p({\mathcal{G}}) = p({\mathcal{V}}, {\mathcal{E}}) p({\mathcal{X}}|{\mathcal{V}}, {\mathcal{E}})$. The method leverages a node-degree-preserving random graph generator and a heterogeneous MP-GNN to predict noise vectors, enabling parallel, scalable sampling and capturing long-range inter-table dependencies. Experiments on six real-world RDBs demonstrate substantial improvements in multi-hop fidelity metrics over autoregressive baselines, while maintaining competitive single-table fidelity. This approach advances privacy-preserving data generation for relational data and paves the way for more scalable, flexible downstream analyses of synthetic RDBs.
Abstract
Building generative models for relational databases (RDBs) is important for applications like privacy-preserving data release and augmenting real datasets. However, most prior work either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially. This approach limits parallelism, restricts flexibility in downstream applications like missing value imputation, and compounds errors due to commonly made conditional independence assumptions. We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM). GRDM leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics.
