Table of Contents
Fetching ...

Efficient privacy loss accounting for subsampling and random allocation

Vitaly Feldman, Moshe Shenfeld

TL;DR

This work addresses the problem of tightly accounting for privacy loss under $k$-out-of-$t$ random allocation, a subsampling scheme relevant to DP-SGD and privacy-preserving communication. It develops a PLD-based framework using dominating pairs to compute the PLD of random allocation via log-sum-exp expressions and $t$-wise convolutions, with a reduction from general $k$-out-of-$t$ to the $k=1$ case. The authors provide an efficient algorithm with runtime $O(\log^{3}(t)\cdot \log(t/\beta)/\alpha^{2})$, establish discretization-based PLD realizations, and demonstrate numerically that their bounds are competitive with or tighter than prior analytic methods and Monte Carlo estimates, yielding improved privacy-utility trade-offs for DP-SGD with PREAMBLE. This PLD-centered accounting enables accurate, scalable privacy analysis for complex subsampling schemes and nested compositions, facilitating practical deployment of privacy-preserving learning and data aggregation systems.

Abstract

We consider the privacy amplification properties of a sampling scheme in which a user's data is used in $k$ steps chosen randomly and uniformly from a sequence (or set) of $t$ steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting privacy parameters are not tight due to the approximation steps in the analysis. Second, the computed parameters are either the hockey stick or Renyi divergence, both of which introduce overheads when used in privacy loss accounting. In this work, we demonstrate that the privacy loss distribution (PLD) of random allocation applied to any differentially private algorithm can be computed efficiently. When applied to the Gaussian mechanism, our results demonstrate that the privacy-utility trade-off for random allocation is at least as good as that of Poisson subsampling. In particular, random allocation is better suited for training via DP-SGD. To support these computations, our work develops new tools for general privacy loss accounting based on a notion of PLD realization. This notion allows us to extend accurate privacy loss accounting to subsampling which previously required manual noise-mechanism-specific analysis.

Efficient privacy loss accounting for subsampling and random allocation

TL;DR

This work addresses the problem of tightly accounting for privacy loss under -out-of- random allocation, a subsampling scheme relevant to DP-SGD and privacy-preserving communication. It develops a PLD-based framework using dominating pairs to compute the PLD of random allocation via log-sum-exp expressions and -wise convolutions, with a reduction from general -out-of- to the case. The authors provide an efficient algorithm with runtime , establish discretization-based PLD realizations, and demonstrate numerically that their bounds are competitive with or tighter than prior analytic methods and Monte Carlo estimates, yielding improved privacy-utility trade-offs for DP-SGD with PREAMBLE. This PLD-centered accounting enables accurate, scalable privacy analysis for complex subsampling schemes and nested compositions, facilitating practical deployment of privacy-preserving learning and data aggregation systems.

Abstract

We consider the privacy amplification properties of a sampling scheme in which a user's data is used in steps chosen randomly and uniformly from a sequence (or set) of steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting privacy parameters are not tight due to the approximation steps in the analysis. Second, the computed parameters are either the hockey stick or Renyi divergence, both of which introduce overheads when used in privacy loss accounting. In this work, we demonstrate that the privacy loss distribution (PLD) of random allocation applied to any differentially private algorithm can be computed efficiently. When applied to the Gaussian mechanism, our results demonstrate that the privacy-utility trade-off for random allocation is at least as good as that of Poisson subsampling. In particular, random allocation is better suited for training via DP-SGD. To support these computations, our work develops new tools for general privacy loss accounting based on a notion of PLD realization. This notion allows us to extend accurate privacy loss accounting to subsampling which previously required manual noise-mechanism-specific analysis.
Paper Structure (23 sections, 7 theorems, 29 equations, 11 figures, 9 algorithms)

This paper contains 23 sections, 7 theorems, 29 equations, 11 figures, 9 algorithms.

Key Result

Lemma 2.7

Given $t \in \mathbb{N}$; $k \in [t]$ and an algorithm $M$ dominated by a randomizer $R$, we have $\vec{\delta}_{\mathcal{A}_{t, k}\left(M\right)}(\varepsilon) \le \vec{\delta}_{\mathcal{A}_{t, k}\left(R\right)}(\varepsilon)$ and $\reflectbox{$$\delta$$}_{\mathcal{A}_{t, k}\left(M\right)}(\varepsilo

Figures (11)

  • Figure 1: Upper and lower bounds on privacy parameter $\varepsilon$ as a function of the noise parameter $\sigma$ for various values of $t$, all using the Gaussian mechanism with fixed $\delta = 10^{-6}$. We compare our upper and lower bounds (which are nearly identical) to upper bounds on random allocation FS25DCO25, and to the Poisson scheme with $\lambda = 1/t$.
  • Figure 2: Comparison of the privacy profile of the Poisson scheme and various bounds for the random allocation scheme; the combined methods in FS25, the high probability and the average estimations using Monte Carlo simulation and the lower bound by CGHLKKMSZ24, and our numerical method, following the setting in CGHLKKMSZ24
  • Figure 3: Runtime as a function of accuracy $\alpha$ and number of steps $t$ on Apple MacBook Pro M1, using $\sigma = 1$.
  • Figure 4: Analytical and empirical square error for the Poisson and random allocation scheme using both the combined method in FS25 and our PLD accounting, for various values of $\varepsilon$ and $d$ (which corresponds to an increase in sensitivity). We set $p = 0.9$, $t = 10^{3}$, $\delta = 10^{-10}$.
  • Figure 5: The ratio between the noise level required to achieve $(\varepsilon=1, \delta=10^{-6})$-DP using the PREAMBLE method and simple Gaussian noise addition for DP-SGD calculated via RDP and numerical accounting.
  • ...and 6 more figures

Theorems & Definitions (35)

  • Definition 2.1: PLD DR16
  • Definition 2.2: Hockey-stick divergence BKOZB12
  • Definition 2.3: Privacy profile BBG18
  • Definition 2.4: Differential privacy DKMMN06
  • Definition 2.5: Dominating pair ZDW22
  • Definition 2.6: Dominating randomizer
  • Lemma 2.7: Allocation reduction to randomizer FS25
  • Lemma 2.8: Reduction to a single allocation FS25
  • Claim 2.9: Dominating pair of distributions for random allocation FS25
  • Definition 3.1: PLD realization
  • ...and 25 more