Private Evolution Converges
Tomás González, Giulia Fanti, Aaditya Ramdas
TL;DR
This work revisits Private Evolution (PE), a training-free approach for differentially private synthetic data, and develops a realistic convergence theory that avoids prior unrealistic multiplicity assumptions. By introducing a tractable Euclidean-space variant of PE with a $D_{BL}$-projected nearest-neighbor histogram, the authors prove a worst-case $1$-Wasserstein convergence bound of the form $\mathbb{E}[W_1(\mu_S, \mu_{S_T})] \le \tilde{O}(d D \sigma^{1/d})$ under DP, with $\sigma$ tied to the Gaussian mechanism and privacy parameters. The analysis extends to Banach spaces, clarifies the relationship between PE and the Private Signed Measure Mechanism (PSMM), and shows how practical PE naturally implements a sequential version of PSMM. Empirical results on synthetic data and CIFAR-10-like tasks corroborate the theory and provide guidance on hyperparameter choices (e.g., the number of evolution steps $T$ and synthetic sample count $n_s$). The work deepens the theoretical understanding of DP synthetic data generation and offers concrete, principled settings to deploy PE effectively in practice.
Abstract
Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to understand PE's practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a convex and compact domain, we prove that under the right hyperparameter settings and given access to the Gaussian variation API proposed in \cite{PE23}, PE produces an $(\varepsilon, δ)$-DP synthetic dataset with expected 1-Wasserstein distance $\tilde{O}(d(n\varepsilon)^{-1/d})$ from the original; this establishes worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in experiments.
