Table of Contents
Fetching ...

Privacy Amplification Persists under Unlimited Synthetic Data Release

Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

TL;DR

This work studies privacy amplification by releasing synthetic data from a private linear generator, showing that amplification persists even when the number of synthetic records grows without bound, provided the generator parameters are bounded. The authors develop a unified, non-asymptotic framework based on Fisher information that links Rényi DP losses along a parameter path and bounds the divergence via an envelope, yielding an amplification effect that interpolates between standard post-processing and true amplification. A key insight is that in the unlimited-release regime, the privacy loss is governed by Gram statistics $V^\top V$ and $W^\top W$, allowing tractable bounds via noncentral $\chi^2$ and Wishart analyses; they derive explicit, dimension-dependent bounds for the one-dimensional and multi-dimensional cases and corroborate the theory with experiments showing rapid convergence to a plateau below the model-parameter post-processing bound. Overall, the results illuminate how bounded parameter spaces and Gram-statistic reductions enable robust privacy guarantees for synthetic-data releases and suggest promising directions for extending these ideas to more complex training procedures and deep generative models.

Abstract

We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al. (2025) established the first formal amplification guarantees for a linear generator, but they apply only in asymptotic regimes where the model dimension far exceeds the number of released synthetic records, limiting their practical relevance. In this work, we show a surprising result: under a bounded-parameter assumption, privacy amplification persists even when releasing an unbounded number of synthetic records, thereby improving upon the bounds of Pierquin et al. (2025). Our analysis provides structural insights that may guide the development of tighter privacy guarantees for more complex release mechanisms.

Privacy Amplification Persists under Unlimited Synthetic Data Release

TL;DR

This work studies privacy amplification by releasing synthetic data from a private linear generator, showing that amplification persists even when the number of synthetic records grows without bound, provided the generator parameters are bounded. The authors develop a unified, non-asymptotic framework based on Fisher information that links Rényi DP losses along a parameter path and bounds the divergence via an envelope, yielding an amplification effect that interpolates between standard post-processing and true amplification. A key insight is that in the unlimited-release regime, the privacy loss is governed by Gram statistics and , allowing tractable bounds via noncentral and Wishart analyses; they derive explicit, dimension-dependent bounds for the one-dimensional and multi-dimensional cases and corroborate the theory with experiments showing rapid convergence to a plateau below the model-parameter post-processing bound. Overall, the results illuminate how bounded parameter spaces and Gram-statistic reductions enable robust privacy guarantees for synthetic-data releases and suggest promising directions for extending these ideas to more complex training procedures and deep generative models.

Abstract

We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al. (2025) established the first formal amplification guarantees for a linear generator, but they apply only in asymptotic regimes where the model dimension far exceeds the number of released synthetic records, limiting their practical relevance. In this work, we show a surprising result: under a bounded-parameter assumption, privacy amplification persists even when releasing an unbounded number of synthetic records, thereby improving upon the bounds of Pierquin et al. (2025). Our analysis provides structural insights that may guide the development of tighter privacy guarantees for more complex release mechanisms.
Paper Structure (40 sections, 47 theorems, 202 equations, 6 figures)

This paper contains 40 sections, 47 theorems, 202 equations, 6 figures.

Key Result

Proposition 3.1

Let $\alpha > 1$. Let $P = \{P_\theta,\; \theta\in \Theta \subset \mathbb{R}\}$ be a family of probability distributions, with associated densities $p_\theta$. For $\theta \in \Theta$, let the Fisher information $I(\theta) = \mathbb{E}_{X \sim P_\theta}[(\partial_\theta \log p_\theta(X))^2]$ and $\D

Figures (6)

  • Figure 1: Summary of our results highlighting the improvements over prior work by pierquin2025.
  • Figure 2: Empirical estimation of $D_\alpha(ZV,ZW)$ as a function of the number of released synthetic data $n_{\text{syn}}$ for multiple values of $d$, $k = \Delta = 1$, $C = \alpha = 2$.
  • Figure 3: Empirical estimation of $D_\alpha(V^\top V,W^\top W)$ as a function of $\Delta$ for multiple values of $d$, $k = 1$, $\|w\|_F \geq 1$, $C = 2$.
  • Figure 4: Representation of the trade-off function $f$ for $\Delta=1, C' = 1, d = 60, n_{\text{syn}} = 1, k=1$.
  • Figure 5: Comparison between the Rényi divergence $D_\alpha(N+v,N+w)$ and the criterion obtained from Proposition \ref{['prop:Fisher-upper-bound']} as a function of $\Delta$, in the case $\sigma=1, \alpha = 2$.
  • ...and 1 more figures

Theorems & Definitions (79)

  • Definition 2.1: Linear generation from Gaussian inputs; pierquin2025
  • Proposition 3.1: Local relationship between Rényi divergences and Fisher information; Haussler1997vanErven2014
  • Proposition 3.2: Upper bounding Rényi divergences through Fisher information
  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 5.1
  • Theorem 5.1: Privacy amplification in the $n_{\mathrm{syn}} \to + \infty$ and high privacy regime, $k=1$
  • Theorem 5.2
  • Lemma 5.1: Rényi divergence equivalence along the SVD
  • ...and 69 more