Table of Contents
Fetching ...

REAEDP: Entropy-Calibrated Differentially Private Data Release with Formal Guarantees and Attack-Based Evaluation

Bo Ma, Jinsong Wu, Wei Qi Yan

Abstract

Sensitive data release is vulnerable to output-side privacy threats such as membership inference, attribute inference, and record linkage. This creates a practical need for release mechanisms that provide formal privacy guarantees while preserving utility in measurable ways. We propose REAEDP, a differential privacy framework that combines entropy-calibrated histogram release, a synthetic-data release mechanism, and attack-based evaluation. On the theory side, we derive an explicit sensitivity bound for Shannon entropy, together with an extension to Rényi entropy, for adjacent histogram datasets, enabling calibrated differentially private release of histogram statistics. We further study a synthetic-data mechanism $\mathcal{F}$ with a privacy-test structure and show that it satisfies a formal differential privacy guarantee under the stated parameter conditions. On multiple public tabular datasets, the empirical entropy change remains below the theoretical bound in the tested regime, standard Laplace and Gaussian baselines exhibit comparable trends, and both membership-inference and linkage-style attack performance move toward random-guess behavior as the privacy parameter decreases. These results support REAEDP as a practically usable privacy-preserving release pipeline in the tested settings. Source code: https://github.com/mabo1215/REAEDP.git

REAEDP: Entropy-Calibrated Differentially Private Data Release with Formal Guarantees and Attack-Based Evaluation

Abstract

Sensitive data release is vulnerable to output-side privacy threats such as membership inference, attribute inference, and record linkage. This creates a practical need for release mechanisms that provide formal privacy guarantees while preserving utility in measurable ways. We propose REAEDP, a differential privacy framework that combines entropy-calibrated histogram release, a synthetic-data release mechanism, and attack-based evaluation. On the theory side, we derive an explicit sensitivity bound for Shannon entropy, together with an extension to Rényi entropy, for adjacent histogram datasets, enabling calibrated differentially private release of histogram statistics. We further study a synthetic-data mechanism with a privacy-test structure and show that it satisfies a formal differential privacy guarantee under the stated parameter conditions. On multiple public tabular datasets, the empirical entropy change remains below the theoretical bound in the tested regime, standard Laplace and Gaussian baselines exhibit comparable trends, and both membership-inference and linkage-style attack performance move toward random-guess behavior as the privacy parameter decreases. These results support REAEDP as a practically usable privacy-preserving release pipeline in the tested settings. Source code: https://github.com/mabo1215/REAEDP.git
Paper Structure (60 sections, 9 theorems, 42 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 60 sections, 9 theorems, 42 equations, 12 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

If mechanism $\mathcal{M}_i$ is $(\varepsilon_i,\delta_i)$-DP for $i=1,\ldots,k$, then the composition $(\mathcal{M}_1(D),\ldots,\mathcal{M}_k(D))$ is $\bigl(\sum_{i=1}^k \varepsilon_i,\, \sum_{i=1}^k \delta_i\bigr)$-DP. See Appendix app:proofs for references.

Figures (12)

  • Figure 1: Privacy test pass rate vs. $\gamma$ for several $k$ ($t=2$).
  • Figure 2: Wiener kernel: original vs. private mean for $\rho = 10^{-6}$, $0.001$, $0.1$ (left to right).
  • Figure 3: Entropy sensitivity bound $\Delta_H$ vs. dataset size $n$, illustrating the decrease of the theoretical bound used for calibrated histogram release.
  • Figure 4: Empirical $\widehat{\Delta H}$ vs. theoretical bound (Theorem \ref{['thm4']}): mean and maximum over adjacent pairs. The ratio remains below 1 in the tested regime, indicating that the empirical entropy sensitivity stays below the theoretical bound.
  • Figure 5: Baseline comparison: entropy error and count MAE vs. $\varepsilon$ (Laplace, Gaussian, DP synthetic (Laplace), DP synthetic (Gaussian)); $\Delta_H$ bound shown.
  • ...and 7 more figures

Theorems & Definitions (16)

  • Definition 1: $(\varepsilon,\delta)$-Differential Privacy dwork2006differential
  • Theorem 1: Sequential Composition
  • Theorem 2: Advanced Composition dwork2014algorithmic
  • Theorem 3: Shannon entropy sensitivity under replacement adjacency
  • Lemma 1
  • Lemma 2: Neighboring datasets
  • Lemma 3
  • Corollary 1
  • Lemma 4
  • Theorem 4: Differential privacy of $\mathcal{F}$ under add/remove adjacency
  • ...and 6 more