Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition
Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke
TL;DR
This work tackles the problem of obtaining tight $(\,\varepsilon,\delta)$-DP guarantees for subsampled mechanisms under composition, focusing on two common pitfalls: that a single worst-case dataset governs self-composition, and that Poisson subsampling and sampling without replacement (WOR) yield interchangeable privacy guarantees. It leverages privacy loss distributions and the dominating-pair framework to analyze both add/remove and substitution neighbouring relations, and constructs explicit dominating pairs for the Gaussian and Laplace mechanisms under various subsampling schemes. Key contributions include counterexamples showing that worst-case datasets may not survive composition, explicit dominating-pair constructions (including for Gaussian under WOR substitution) with associated PLD-based bounds, and a detailed comparison demonstrating that Poisson and WOR can yield significantly different privacy parameters in DP-SGD-like settings. The findings have practical impact by guiding practitioners to align accounting methods with the actual subsampling technique, to report accounting hyperparameters, and to exercise caution when evaluating privacy-performance trade-offs across papers, particularly for DP-SGD and related algorithms.
Abstract
We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.
