Table of Contents
Fetching ...

Composition of Differential Privacy & Privacy Amplification by Subsampling

Thomas Steinke

TL;DR

This chapter develops a unified framework for analyzing how privacy degrades under repeated DP analyses through the lens of privacy loss distributions (PLDs). It introduces concentrated DP (CDP) and Rényi DP (RDP) as refined tools that yield tighter composition than classical pure or approximate DP, and shows how Gaussian mechanisms naturally satisfy these notions. A central theme is privacy amplification by subsampling, with tight results for Poisson and fixed-size subsampling, and practical guidance for integrating PAS into iterative algorithms like SGD. The work connects fundamental bounds (e.g., advanced composition, optimal composition) to actionable accounting methods, enabling private data reuse while maintaining utility. Collectively, it provides rigorous techniques for privacy budgeting in complex, multi-round, or subsampled analyses used in AI applications.

Abstract

This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of privacy amplification by subsampling. This chapter introduces the basic concepts and gives proofs of the key results needed to apply these tools in practice.

Composition of Differential Privacy & Privacy Amplification by Subsampling

TL;DR

This chapter develops a unified framework for analyzing how privacy degrades under repeated DP analyses through the lens of privacy loss distributions (PLDs). It introduces concentrated DP (CDP) and Rényi DP (RDP) as refined tools that yield tighter composition than classical pure or approximate DP, and shows how Gaussian mechanisms naturally satisfy these notions. A central theme is privacy amplification by subsampling, with tight results for Poisson and fixed-size subsampling, and practical guidance for integrating PAS into iterative algorithms like SGD. The work connects fundamental bounds (e.g., advanced composition, optimal composition) to actionable accounting methods, enabling private data reuse while maintaining utility. Collectively, it provides rigorous techniques for privacy budgeting in complex, multi-round, or subsampled analyses used in AI applications.

Abstract

This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of privacy amplification by subsampling. This chapter introduces the basic concepts and gives proofs of the key results needed to apply these tools in practice.
Paper Structure (26 sections, 35 theorems, 138 equations, 2 figures)

This paper contains 26 sections, 35 theorems, 138 equations, 2 figures.

Key Result

Theorem 1

Let $M_1, M_2, \cdots, M_k : \mathcal{X}^n \to \mathcal{Y}$ be randomized algorithms. Suppose $M_j$ is $\varepsilon_j$-DP for each $j \in [k]$. Define $M : \mathcal{X}^n \to \mathcal{Y}^k$ by $M(x)=(M_1(x),M_2(x),\cdots,M_k(x))$, where each algorithm is run independently. Then $M$ is $\varepsilon$-D

Figures (2)

  • Figure 1: Comparison of different composition bounds. We compose $k$ independent $0.1$-DP algorithms to obtain a $(\varepsilon,10^{-6})$-DP guarantee. Theorem \ref{['thm:basic_composition']} -- basic composition -- gives $\varepsilon=k\cdot 0.1$. For comparison, we have advanced composition (Theorem \ref{['thm:advancedcomposition_pure']}), an optimal bound kairouz2015composition, and Concentrated DP (CDP) with the improved conversion from Proposition \ref{['prop:cdp2adp']}. For comparison, we also consider composing the Gaussian mechanism using Corollary \ref{['cor:gauss_adp_exact_multi']}, where the Gaussian noise is scaled to have the same variance as Laplace noise would have to attain $0.1$-DP.
  • Figure 2: Comparison of Rényi divergence guarantees for Poisson subsampling -- i.e., including each person with probability $p=0.05$. The unamplified algorithm satisfies $0.5$-zCDP. The exact bound is given by Theorem \ref{['thm:rdp_subsampling']}. For comparison, we have the analytic upper bound from Proposition \ref{['prop:divergence_subsampling_analytic']} as well as the behaviour in the limit given by Proposition \ref{['prop:rdp_subsampling_jensen']}.

Theorems & Definitions (76)

  • Theorem 1: Basic Composition
  • proof
  • Definition 2: Privacy Loss Distribution
  • Proposition 3: Privacy Loss Distribution of Gaussian
  • proof
  • Lemma 4: Neyman-Pearson Lemma neyman1933ix
  • Remark 5
  • Lemma 6: Change of Distribution for Privacy Loss
  • proof
  • Proposition 7: Conversion from Privacy Loss Distribution to Approximate Differential Privacy
  • ...and 66 more