Table of Contents
Fetching ...

Connect the dots: Dataset Condensation, Differential Privacy, and Adversarial Uncertainty

Kenneth Odoh

TL;DR

This work frames dataset condensation (DC) within $$(\epsilon, \delta)$$-differential privacy and argues that adversarial uncertainty can yield a principled lower bound on the noise level $$\epsilon$$, preserving utility. It provides a mathematical DC formulation and links it to differential privacy, showing how DC can act as the DP randomizer and how $$\epsilon$$ can be inferred from DC outputs. The Appendix offers formal DP guarantees for default and relaxed threat models with explicit bounds on $$\epsilon$$ and $$\delta$$, suggesting practical privacy-preserving synthetic data generation. Overall, the paper clarifies how to achieve high-fidelity, privacy-preserving synthetic data by selecting the noise level via adversarial uncertainty in DC.

Abstract

Our work focuses on understanding the underpinning mechanism of dataset condensation by drawing connections with ($ε$, $δ$)-differential privacy where the optimal noise, $ε$, is chosen by adversarial uncertainty \cite{Grining2017}. We can answer the question about the inner workings of the dataset condensation procedure. Previous work \cite{dong2022} proved the link between dataset condensation (DC) and ($ε$, $δ$)-differential privacy. However, it is unclear from existing works on ablating DC to obtain a lower-bound estimate of $ε$ that will suffice for creating high-fidelity synthetic data. We suggest that adversarial uncertainty is the most appropriate method to achieve an optimal noise level, $ε$. As part of the internal dynamics of dataset condensation, we adopt a satisfactory scheme for noise estimation that guarantees high-fidelity data while providing privacy.

Connect the dots: Dataset Condensation, Differential Privacy, and Adversarial Uncertainty

TL;DR

This work frames dataset condensation (DC) within -differential privacy and argues that adversarial uncertainty can yield a principled lower bound on the noise level , preserving utility. It provides a mathematical DC formulation and links it to differential privacy, showing how DC can act as the DP randomizer and how can be inferred from DC outputs. The Appendix offers formal DP guarantees for default and relaxed threat models with explicit bounds on and , suggesting practical privacy-preserving synthetic data generation. Overall, the paper clarifies how to achieve high-fidelity, privacy-preserving synthetic data by selecting the noise level via adversarial uncertainty in DC.

Abstract

Our work focuses on understanding the underpinning mechanism of dataset condensation by drawing connections with (, )-differential privacy where the optimal noise, , is chosen by adversarial uncertainty \cite{Grining2017}. We can answer the question about the inner workings of the dataset condensation procedure. Previous work \cite{dong2022} proved the link between dataset condensation (DC) and (, )-differential privacy. However, it is unclear from existing works on ablating DC to obtain a lower-bound estimate of that will suffice for creating high-fidelity synthetic data. We suggest that adversarial uncertainty is the most appropriate method to achieve an optimal noise level, . As part of the internal dynamics of dataset condensation, we adopt a satisfactory scheme for noise estimation that guarantees high-fidelity data while providing privacy.
Paper Structure (7 sections, 12 equations)