Table of Contents
Fetching ...

Residual-PAC Privacy: Automatic Privacy Control Beyond the Gaussian Barrier

Tao Zhang, Yevgeniy Vorobeychik

TL;DR

The paper tackles the inefficiency of Gaussian-based Auto-PAC privacy by introducing Residual-PAC (R-PAC) and Stackelberg SR-PAC, which quantify and exploit non-Gaussian structure in data to better allocate privacy noise. It first characterizes the Gaussian barrier in PAC Privacy and then provides two post-processing corrections using DV representations and sliced Wasserstein distances. Building on this, it defines R-PAC privacy to quantify residual privacy via f-divergences and implements SR-PAC as a convex Stackelberg game to automatically choose optimal noise distributions that maximize utility while respecting a privacy budget. The proposed approach yields tighter privacy budgets, anisotropic and directional noise tailored to data geometry, and provable advantages in composition, as demonstrated by extensive experiments against PAC and DP baselines. Overall, SR-PAC delivers improved privacy-utility tradeoffs across diverse datasets and distributions, with strong theoretical guarantees and practical scalability through Monte Carlo simulation.

Abstract

The Probably Approximately Correct (PAC) Privacy framework [46] provides a powerful instance-based methodology to preserve privacy in complex data-driven systems. Existing PAC Privacy algorithms (we call them Auto-PAC) rely on a Gaussian mutual information upper bound. However, we show that the upper bound obtained by these algorithms is tight if and only if the perturbed mechanism output is jointly Gaussian with independent Gaussian noise. We propose two approaches for addressing this issue. First, we introduce two tractable post-processing methods for Auto-PAC, based on Donsker-Varadhan representation and sliced Wasserstein distances. However, the result still leaves wasted privacy budget. To address this issue more fundamentally, we introduce Residual-PAC (R-PAC) Privacy, an f-divergence-based measure to quantify privacy that remains after adversarial inference. To implement R-PAC Privacy in practice, we propose a Stackelberg Residual-PAC (SR-PAC) privatization mechanism, a game-theoretic framework that selects optimal noise distributions through convex bilevel optimization. Our approach achieves efficient privacy budget utilization for arbitrary data distributions and naturally composes when multiple mechanisms access the dataset. Through extensive experiments, we demonstrate that SR-PAC consistently obtains a better privacy-utility tradeoff than both PAC and differential privacy baselines.

Residual-PAC Privacy: Automatic Privacy Control Beyond the Gaussian Barrier

TL;DR

The paper tackles the inefficiency of Gaussian-based Auto-PAC privacy by introducing Residual-PAC (R-PAC) and Stackelberg SR-PAC, which quantify and exploit non-Gaussian structure in data to better allocate privacy noise. It first characterizes the Gaussian barrier in PAC Privacy and then provides two post-processing corrections using DV representations and sliced Wasserstein distances. Building on this, it defines R-PAC privacy to quantify residual privacy via f-divergences and implements SR-PAC as a convex Stackelberg game to automatically choose optimal noise distributions that maximize utility while respecting a privacy budget. The proposed approach yields tighter privacy budgets, anisotropic and directional noise tailored to data geometry, and provable advantages in composition, as demonstrated by extensive experiments against PAC and DP baselines. Overall, SR-PAC delivers improved privacy-utility tradeoffs across diverse datasets and distributions, with strong theoretical guarantees and practical scalability through Monte Carlo simulation.

Abstract

The Probably Approximately Correct (PAC) Privacy framework [46] provides a powerful instance-based methodology to preserve privacy in complex data-driven systems. Existing PAC Privacy algorithms (we call them Auto-PAC) rely on a Gaussian mutual information upper bound. However, we show that the upper bound obtained by these algorithms is tight if and only if the perturbed mechanism output is jointly Gaussian with independent Gaussian noise. We propose two approaches for addressing this issue. First, we introduce two tractable post-processing methods for Auto-PAC, based on Donsker-Varadhan representation and sliced Wasserstein distances. However, the result still leaves wasted privacy budget. To address this issue more fundamentally, we introduce Residual-PAC (R-PAC) Privacy, an f-divergence-based measure to quantify privacy that remains after adversarial inference. To implement R-PAC Privacy in practice, we propose a Stackelberg Residual-PAC (SR-PAC) privatization mechanism, a game-theoretic framework that selects optimal noise distributions through convex bilevel optimization. Our approach achieves efficient privacy budget utilization for arbitrary data distributions and naturally composes when multiple mechanisms access the dataset. Through extensive experiments, we demonstrate that SR-PAC consistently obtains a better privacy-utility tradeoff than both PAC and differential privacy baselines.

Paper Structure

This paper contains 65 sections, 22 theorems, 148 equations, 5 figures, 6 tables, 6 algorithms.

Key Result

Theorem 1

For an arbitrary deterministic mechanism $\mathcal{M}$ and Gaussian noise $B \sim \mathcal{N}(0, \Sigma_B)$, the mutual information satisfies Moreover, there exists $\Sigma_B$ such that $\mathbb{E}[\|B\|_2^2] = \left( \sum_{j=1}^d \sqrt{\lambda_j} \right)^2$ with $\{\lambda_j\}$ being the eigenvalues of $\Sigma_{\mathcal{M}(X)}$, and $\mathtt{MI}(X; \mathcal{M}(X)+B) \leq \frac{1}{2}$.

Figures (5)

  • Figure 1: Empirical comparisons of SR-PAC, Auto-PAC (Algorithm \ref{['alg:PAC_original']}), and Efficient-PAC (Algorithm \ref{['alg:PAC_alg_original']}) on CIFAR-10, CIFAR-100, MNIST, and AG-News as $\beta$ varies. Each column corresponds to one dataset; within each column, the three panels report (top) classification accuracy of the perturbed model versus the target budget $\beta$, (middle) the average noise magnitude $\mathbb{E}[\lVert B\rVert_{2}^{2}]$ used by each method, and (bottom) the "target versus achieved" privacy budget (conditional entropy) for our SR-PAC.
  • Figure 2: Empirical comparisons of DP, Auto-PAC, Efficient-PAC, and SR-PAC on mean estimations, using Iris and Rice datasets, in terms of average noise magnitude $\mathbb{E}[\|B\|^2_2]$. All the numerical values are shown in Tables \ref{['tab:Iris_value']} and \ref{['tab:Rice_value']}.
  • Figure 3: Noise magnitudes of SR-PAC of Fig. \ref{['fig:DP_comparisons']}a and b. All the numerical values are shown in Tables \ref{['tab:Iris_value']} and \ref{['tab:Rice_value']}.
  • Figure 4: SR-PAC's performance of implementing the target privacy budget of Fig. \ref{['fig:DP_comparisons']}a and b. All the numerical values are shown in Tables \ref{['tab:Iris_value']} and \ref{['tab:Rice_value']}.
  • Figure 5: The performance of empirical membership inference attack using empritical LIRA, measured by the empirical posterior success rate (PSR). All the numerical values are shown in Tables \ref{['tab:Iris_value']} and \ref{['tab:Rice_value']}.

Theorems & Definitions (35)

  • Definition 1: $(\delta, \rho, \mathcal{D})$-PAC Privacy xiao2023pac
  • Definition 2: $f$-Divergence
  • Definition 3: $(\Delta_f^\delta, \rho, \mathcal{D})$ PAC Advantage Privacy xiao2023pac
  • Definition 4: Mutual Information
  • Theorem 1: Theorem 3 of xiao2023pac
  • Definition 5: $(\epsilon, \Bar{\delta})$-Differential Privacy dwork2006calibrating
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • ...and 25 more