Privacy Amplification Persists under Unlimited Synthetic Data Release

Clément Pierquin; Aurélien Bellet; Marc Tommasi; Matthieu Boussard

Privacy Amplification Persists under Unlimited Synthetic Data Release

Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

TL;DR

This work studies privacy amplification by releasing synthetic data from a private linear generator, showing that amplification persists even when the number of synthetic records grows without bound, provided the generator parameters are bounded. The authors develop a unified, non-asymptotic framework based on Fisher information that links Rényi DP losses along a parameter path and bounds the divergence via an envelope, yielding an amplification effect that interpolates between standard post-processing and true amplification. A key insight is that in the unlimited-release regime, the privacy loss is governed by Gram statistics $V^\top V$ and $W^\top W$, allowing tractable bounds via noncentral $\chi^2$ and Wishart analyses; they derive explicit, dimension-dependent bounds for the one-dimensional and multi-dimensional cases and corroborate the theory with experiments showing rapid convergence to a plateau below the model-parameter post-processing bound. Overall, the results illuminate how bounded parameter spaces and Gram-statistic reductions enable robust privacy guarantees for synthetic-data releases and suggest promising directions for extending these ideas to more complex training procedures and deep generative models.

Abstract

We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al. (2025) established the first formal amplification guarantees for a linear generator, but they apply only in asymptotic regimes where the model dimension far exceeds the number of released synthetic records, limiting their practical relevance. In this work, we show a surprising result: under a bounded-parameter assumption, privacy amplification persists even when releasing an unbounded number of synthetic records, thereby improving upon the bounds of Pierquin et al. (2025). Our analysis provides structural insights that may guide the development of tighter privacy guarantees for more complex release mechanisms.

Privacy Amplification Persists under Unlimited Synthetic Data Release

TL;DR

and

, allowing tractable bounds via noncentral

and Wishart analyses; they derive explicit, dimension-dependent bounds for the one-dimensional and multi-dimensional cases and corroborate the theory with experiments showing rapid convergence to a plateau below the model-parameter post-processing bound. Overall, the results illuminate how bounded parameter spaces and Gram-statistic reductions enable robust privacy guarantees for synthetic-data releases and suggest promising directions for extending these ideas to more complex training procedures and deep generative models.

Abstract

Paper Structure (40 sections, 47 theorems, 202 equations, 6 figures)

This paper contains 40 sections, 47 theorems, 202 equations, 6 figures.

Introduction
Setting and Overview of our Results
Relationship between Fisher information and Rényi divergences
From Outputs to Sufficient Gram Statistics
Privacy Amplification in Linear Synthetic Data Generation
Releasing One-Dimensional Synthetic Data
Releasing Multi-Dimensional Synthetic Data
Experiments
Estimating $D_\alpha(ZV,ZW)$ as a function of $n_{\text{syn}}$
Estimating $D_\alpha(V^\top V, W^\top W)$ as a function of $d,\Delta$
Discussion
On the tightness of the plateau $D_\alpha(V^\top V, W^\top W)$
On the generalization to other training procedures
Conclusion
Background on differential privacy
...and 25 more sections

Key Result

Proposition 3.1

Let $\alpha > 1$. Let $P = \{P_\theta,\; \theta\in \Theta \subset \mathbb{R}\}$ be a family of probability distributions, with associated densities $p_\theta$. For $\theta \in \Theta$, let the Fisher information $I(\theta) = \mathbb{E}_{X \sim P_\theta}[(\partial_\theta \log p_\theta(X))^2]$ and $\D

Figures (6)

Figure 1: Summary of our results highlighting the improvements over prior work by pierquin2025.
Figure 2: Empirical estimation of $D_\alpha(ZV,ZW)$ as a function of the number of released synthetic data $n_{\text{syn}}$ for multiple values of $d$, $k = \Delta = 1$, $C = \alpha = 2$.
Figure 3: Empirical estimation of $D_\alpha(V^\top V,W^\top W)$ as a function of $\Delta$ for multiple values of $d$, $k = 1$, $\|w\|_F \geq 1$, $C = 2$.
Figure 4: Representation of the trade-off function $f$ for $\Delta=1, C' = 1, d = 60, n_{\text{syn}} = 1, k=1$.
Figure 5: Comparison between the Rényi divergence $D_\alpha(N+v,N+w)$ and the criterion obtained from Proposition \ref{['prop:Fisher-upper-bound']} as a function of $\Delta$, in the case $\sigma=1, \alpha = 2$.
...and 1 more figures

Theorems & Definitions (79)

Definition 2.1: Linear generation from Gaussian inputs; pierquin2025
Proposition 3.1: Local relationship between Rényi divergences and Fisher information; Haussler1997vanErven2014
Proposition 3.2: Upper bounding Rényi divergences through Fisher information
Proposition 4.1
Proposition 4.2
Proposition 4.3
Proposition 5.1
Theorem 5.1: Privacy amplification in the $n_{\mathrm{syn}} \to + \infty$ and high privacy regime, $k=1$
Theorem 5.2
Lemma 5.1: Rényi divergence equivalence along the SVD
...and 69 more

Privacy Amplification Persists under Unlimited Synthetic Data Release

TL;DR

Abstract

Privacy Amplification Persists under Unlimited Synthetic Data Release

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (79)