Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis
Kai Chen, Chen Gong, Tianhao Wang
TL;DR
The paper tackles DP tabular data synthesis by challenging the notion that statistical methods are universally superior. It introduces MargNet, a neural-network framework that uses adaptive marginal selection to learn and generate data consistent with selected marginals under DP guarantees, aiming to handle densely correlated attributes. The authors provide theoretical bounds on marginal fitting errors and demonstrate through extensive experiments that MargNet achieves strong utility on densely correlated data, competitive performance on sparse data with significant speedups, and complementary strengths relative to AIM. The work suggests that algorithm choice should be dataset-dependent and that neural-network based approaches, when carefully designed around marginal information, can offer scalable, high-utility DP synthesis for complex tabular data.
Abstract
In differentially private (DP) tabular data synthesis, the consensus is that statistical models are better than neural network (NN)-based methods. However, we argue that this conclusion is incomplete and overlooks the challenge of densely correlated datasets, where intricate dependencies can overwhelm statistical models. In such complex scenarios, neural networks are more suitable due to their capacity to fit complex distributions by learning directly from samples. Despite this potential, existing NN-based algorithms still suffer from significant limitations. We therefore propose MargNet, incorporating successful algorithmic designs of statistical models into neural networks. MargNet applies an adaptive marginal selection strategy and trains the neural networks to generate data that conforms to the selected marginals. On sparsely correlated datasets, our approach achieves utility close to the best statistical method while offering an average 7$\times$ speedup over it. More importantly, on densely correlated datasets, MargNet establishes a new state-of-the-art, reducing fidelity error by up to 26\% compared to the previous best. We release our code on GitHub.\footnote{https://github.com/KaiChen9909/margnet}
