Table of Contents
Fetching ...

Joint Relational Database Generation via Graph-Conditional Diffusion Models

Mohamed Amine Ketata, David Lüdke, Leo Schwinn, Stephan Günnemann

TL;DR

We address synthetic relational database generation by moving beyond autoregressive table-ordering and toward a joint, graph-based framework. The paper introduces Graph-Conditional Relational Diffusion Model (GRDM), which first samples a structure-preserving graph ${\mathcal{G}}=(\mathcal{V},\mathcal{E},\mathcal{X})$ to represent ${\mathcal{R}}$, and then jointly denoises node attributes with a diffusion model conditioned on a local $K$-hop neighborhood, modeling $p({\mathcal{G}}) = p({\mathcal{V}}, {\mathcal{E}}) p({\mathcal{X}}|{\mathcal{V}}, {\mathcal{E}})$. The method leverages a node-degree-preserving random graph generator and a heterogeneous MP-GNN to predict noise vectors, enabling parallel, scalable sampling and capturing long-range inter-table dependencies. Experiments on six real-world RDBs demonstrate substantial improvements in multi-hop fidelity metrics over autoregressive baselines, while maintaining competitive single-table fidelity. This approach advances privacy-preserving data generation for relational data and paves the way for more scalable, flexible downstream analyses of synthetic RDBs.

Abstract

Building generative models for relational databases (RDBs) is important for applications like privacy-preserving data release and augmenting real datasets. However, most prior work either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially. This approach limits parallelism, restricts flexibility in downstream applications like missing value imputation, and compounds errors due to commonly made conditional independence assumptions. We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM). GRDM leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics.

Joint Relational Database Generation via Graph-Conditional Diffusion Models

TL;DR

We address synthetic relational database generation by moving beyond autoregressive table-ordering and toward a joint, graph-based framework. The paper introduces Graph-Conditional Relational Diffusion Model (GRDM), which first samples a structure-preserving graph to represent , and then jointly denoises node attributes with a diffusion model conditioned on a local -hop neighborhood, modeling . The method leverages a node-degree-preserving random graph generator and a heterogeneous MP-GNN to predict noise vectors, enabling parallel, scalable sampling and capturing long-range inter-table dependencies. Experiments on six real-world RDBs demonstrate substantial improvements in multi-hop fidelity metrics over autoregressive baselines, while maintaining competitive single-table fidelity. This approach advances privacy-preserving data generation for relational data and paves the way for more scalable, flexible downstream analyses of synthetic RDBs.

Abstract

Building generative models for relational databases (RDBs) is important for applications like privacy-preserving data release and augmenting real datasets. However, most prior work either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially. This approach limits parallelism, restricts flexibility in downstream applications like missing value imputation, and compounds errors due to commonly made conditional independence assumptions. We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM). GRDM leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics.

Paper Structure

This paper contains 47 sections, 27 equations, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: Comparison of autoregressive and joint relational database generation.
  • Figure 2: Tabular and graph representations of relational databases. We use different colours and different arrow shapes to depict different node and edge types, respectively.