Table of Contents
Fetching ...

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

TL;DR

This work tackles the problem of obtaining tight $(\,\varepsilon,\delta)$-DP guarantees for subsampled mechanisms under composition, focusing on two common pitfalls: that a single worst-case dataset governs self-composition, and that Poisson subsampling and sampling without replacement (WOR) yield interchangeable privacy guarantees. It leverages privacy loss distributions and the dominating-pair framework to analyze both add/remove and substitution neighbouring relations, and constructs explicit dominating pairs for the Gaussian and Laplace mechanisms under various subsampling schemes. Key contributions include counterexamples showing that worst-case datasets may not survive composition, explicit dominating-pair constructions (including for Gaussian under WOR substitution) with associated PLD-based bounds, and a detailed comparison demonstrating that Poisson and WOR can yield significantly different privacy parameters in DP-SGD-like settings. The findings have practical impact by guiding practitioners to align accounting methods with the actual subsampling technique, to report accounting hyperparameters, and to exercise caution when evaluating privacy-performance trade-offs across papers, particularly for DP-SGD and related algorithms.

Abstract

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

TL;DR

This work tackles the problem of obtaining tight -DP guarantees for subsampled mechanisms under composition, focusing on two common pitfalls: that a single worst-case dataset governs self-composition, and that Poisson subsampling and sampling without replacement (WOR) yield interchangeable privacy guarantees. It leverages privacy loss distributions and the dominating-pair framework to analyze both add/remove and substitution neighbouring relations, and constructs explicit dominating pairs for the Gaussian and Laplace mechanisms under various subsampling schemes. Key contributions include counterexamples showing that worst-case datasets may not survive composition, explicit dominating-pair constructions (including for Gaussian under WOR substitution) with associated PLD-based bounds, and a detailed comparison demonstrating that Poisson and WOR can yield significantly different privacy parameters in DP-SGD-like settings. The findings have practical impact by guiding practitioners to align accounting methods with the actual subsampling technique, to report accounting hyperparameters, and to exercise caution when evaluating privacy-performance trade-offs across papers, particularly for DP-SGD and related algorithms.

Abstract

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in for Poisson subsampling and for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.
Paper Structure (9 sections, 7 theorems, 28 equations, 4 figures, 1 table)

This paper contains 9 sections, 7 theorems, 28 equations, 4 figures, 1 table.

Key Result

Theorem 8

If $(P, Q)$ is a dominating pair for a mechanism $\mathcal{M}$ then $(P^k, Q^k)$ is a dominating pair for $k$ iterations of $\mathcal{M}$.

Figures (4)

  • Figure 1: The privacy curves for the subsampled Laplace mechanism under the remove and add neighbouring relations respectively are shown. The dominance of the privacy curve under the remove over the add neighbouring relation for $\varepsilon \geq 0$ is not preserved by composition.
  • Figure 2: Plots of the smallest noise multiplier $\sigma$ required to achieve certain privacy parameters for the subsampled Gaussian mechanism with varying sampling rates under add/remove. Each line shows a specific value of $\varepsilon$ for either Poisson subsampling or sampling without replacement. The parameter $\delta$ is fixed to $10^{-6}$ for all lines.
  • Figure 3: Hockey-stick divergence of the Laplace mechanism when sampling without replacement under $\sim_S$. The worst-case pair of datasets depends on the value of $\varepsilon$.
  • Figure 4: Hockey-stick divergence for the Gaussian mechanism under substitution when sampling without replacement using a dominating pair of distributions. The dominating pair of distributions is constructed using a point-wise maximum of the privacy curve for a single iteration as seen in the left plot. The right plot compares the privacy curve from self-composing the dominating pair of distributions with a lower bound obtained from self-composing the PLD that corresponds to the blue line in the left plot. The dotted line for the RDP accountant is used for reference of scale. The difference between the blue and the dotted line corresponds to the difference between using the PLD and RDP accountants for Poisson subsampling under add/remove.

Theorems & Definitions (19)

  • Definition 1: $(\varepsilon, \delta)$-Differential Privacy
  • Definition 2: Neighbouring Datasets
  • Definition 3: Hockey-stick Divergence
  • Definition 4: Privacy Curves
  • Definition 5: Privacy Loss Distribution
  • Definition 6: Subsampling
  • Definition 7: Dominating Pair of Distributions zhu22-optimal-characteristic-functions
  • Theorem 8: Following Theorem 10 of zhu22-optimal-characteristic-functions
  • Proposition 9
  • Theorem 10: Theorem 11 of zhu22-optimal-characteristic-functions
  • ...and 9 more