Connect the dots: Dataset Condensation, Differential Privacy, and Adversarial Uncertainty
Kenneth Odoh
TL;DR
This work frames dataset condensation (DC) within $$(\epsilon, \delta)$$-differential privacy and argues that adversarial uncertainty can yield a principled lower bound on the noise level $$\epsilon$$, preserving utility. It provides a mathematical DC formulation and links it to differential privacy, showing how DC can act as the DP randomizer and how $$\epsilon$$ can be inferred from DC outputs. The Appendix offers formal DP guarantees for default and relaxed threat models with explicit bounds on $$\epsilon$$ and $$\delta$$, suggesting practical privacy-preserving synthetic data generation. Overall, the paper clarifies how to achieve high-fidelity, privacy-preserving synthetic data by selecting the noise level via adversarial uncertainty in DC.
Abstract
Our work focuses on understanding the underpinning mechanism of dataset condensation by drawing connections with ($ε$, $δ$)-differential privacy where the optimal noise, $ε$, is chosen by adversarial uncertainty \cite{Grining2017}. We can answer the question about the inner workings of the dataset condensation procedure. Previous work \cite{dong2022} proved the link between dataset condensation (DC) and ($ε$, $δ$)-differential privacy. However, it is unclear from existing works on ablating DC to obtain a lower-bound estimate of $ε$ that will suffice for creating high-fidelity synthetic data. We suggest that adversarial uncertainty is the most appropriate method to achieve an optimal noise level, $ε$. As part of the internal dynamics of dataset condensation, we adopt a satisfactory scheme for noise estimation that guarantees high-fidelity data while providing privacy.
