Composition of Differential Privacy & Privacy Amplification by Subsampling

Thomas Steinke

Composition of Differential Privacy & Privacy Amplification by Subsampling

Thomas Steinke

TL;DR

This chapter develops a unified framework for analyzing how privacy degrades under repeated DP analyses through the lens of privacy loss distributions (PLDs). It introduces concentrated DP (CDP) and Rényi DP (RDP) as refined tools that yield tighter composition than classical pure or approximate DP, and shows how Gaussian mechanisms naturally satisfy these notions. A central theme is privacy amplification by subsampling, with tight results for Poisson and fixed-size subsampling, and practical guidance for integrating PAS into iterative algorithms like SGD. The work connects fundamental bounds (e.g., advanced composition, optimal composition) to actionable accounting methods, enabling private data reuse while maintaining utility. Collectively, it provides rigorous techniques for privacy budgeting in complex, multi-round, or subsampled analyses used in AI applications.

Abstract

This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of privacy amplification by subsampling. This chapter introduces the basic concepts and gives proofs of the key results needed to apply these tools in practice.

Composition of Differential Privacy & Privacy Amplification by Subsampling

TL;DR

Abstract

Paper Structure (26 sections, 35 theorems, 138 equations, 2 figures)

This paper contains 26 sections, 35 theorems, 138 equations, 2 figures.

Introduction
Basic Composition
Is Basic Composition Optimal?
Privacy Loss Distributions
Privacy Loss of Gaussian Noise Addition
Statistical Hypothesis Testing Perspective
Approximate DP & the Privacy Loss Distribution
Composition via the Privacy Loss Distribution
Basic Composition, Revisited:
Gaussian Composition:
Composition via Gaussian Approximation:
Concentrated Differential Privacy
Adaptive Composition & Postprocessing
Composition of Approximate $(\varepsilon,\delta)$-DP
Asymptotic Optimality of Composition
...and 11 more sections

Key Result

Theorem 1

Let $M_1, M_2, \cdots, M_k : \mathcal{X}^n \to \mathcal{Y}$ be randomized algorithms. Suppose $M_j$ is $\varepsilon_j$-DP for each $j \in [k]$. Define $M : \mathcal{X}^n \to \mathcal{Y}^k$ by $M(x)=(M_1(x),M_2(x),\cdots,M_k(x))$, where each algorithm is run independently. Then $M$ is $\varepsilon$-D

Figures (2)

Figure 1: Comparison of different composition bounds. We compose $k$ independent $0.1$-DP algorithms to obtain a $(\varepsilon,10^{-6})$-DP guarantee. Theorem \ref{['thm:basic_composition']} -- basic composition -- gives $\varepsilon=k\cdot 0.1$. For comparison, we have advanced composition (Theorem \ref{['thm:advancedcomposition_pure']}), an optimal bound kairouz2015composition, and Concentrated DP (CDP) with the improved conversion from Proposition \ref{['prop:cdp2adp']}. For comparison, we also consider composing the Gaussian mechanism using Corollary \ref{['cor:gauss_adp_exact_multi']}, where the Gaussian noise is scaled to have the same variance as Laplace noise would have to attain $0.1$-DP.
Figure 2: Comparison of Rényi divergence guarantees for Poisson subsampling -- i.e., including each person with probability $p=0.05$. The unamplified algorithm satisfies $0.5$-zCDP. The exact bound is given by Theorem \ref{['thm:rdp_subsampling']}. For comparison, we have the analytic upper bound from Proposition \ref{['prop:divergence_subsampling_analytic']} as well as the behaviour in the limit given by Proposition \ref{['prop:rdp_subsampling_jensen']}.

Theorems & Definitions (76)

Theorem 1: Basic Composition
proof
Definition 2: Privacy Loss Distribution
Proposition 3: Privacy Loss Distribution of Gaussian
proof
Lemma 4: Neyman-Pearson Lemma neyman1933ix
Remark 5
Lemma 6: Change of Distribution for Privacy Loss
proof
Proposition 7: Conversion from Privacy Loss Distribution to Approximate Differential Privacy
...and 66 more

Composition of Differential Privacy & Privacy Amplification by Subsampling

TL;DR

Abstract

Composition of Differential Privacy & Privacy Amplification by Subsampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (76)