Table of Contents
Fetching ...

An information theorist's tour of differential privacy

Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

TL;DR

This work presents differential privacy through an information-theoretic perspective, modeling DP mechanisms as channels and connecting privacy guarantees to hypothesis-testing tradeoffs and divergences such as $D_{\mathrm{KL}}$, $f$-divergences, and $E_{\gamma}$ (hockey-stick) divergence. It introduces the privacy loss random variable (PLRV) $L_{\mathcal{D},\mathcal{D}'}$ and explores how its distribution governs DP guarantees, including composition and Gaussian DP formulations. The article surveys exact and approximate accounting methods for privacy loss under multiple queries, such as FFT-based, cumulant generating function, and saddle-point approaches, and discusses optimal noise distributions (e.g., Cactus, Schrödinger mechanisms) in large and small composition regimes. Finally, it addresses practical implications for ML (e.g., DP-SGD, subsampling amplification), synthetic data, and the challenge of choosing interpretable privacy parameters, highlighting opportunities for further information-theoretic insights to improve privacy guarantees and utility.

Abstract

Since being proposed in 2006, differential privacy has become a standard method for quantifying certain risks in publishing or sharing analyses of sensitive data. At its heart, differential privacy measures risk in terms of the differences between probability distributions, which is a central topic in information theory. A differentially private algorithm is a channel between the underlying data and the output of the analysis. Seen in this way, the guarantees made by differential privacy can be understood in terms of properties of this channel. In this article we examine a few of the key connections between information theory and the formulation/application of differential privacy, giving an ``operational significance'' for relevant information measures.

An information theorist's tour of differential privacy

TL;DR

This work presents differential privacy through an information-theoretic perspective, modeling DP mechanisms as channels and connecting privacy guarantees to hypothesis-testing tradeoffs and divergences such as , -divergences, and (hockey-stick) divergence. It introduces the privacy loss random variable (PLRV) and explores how its distribution governs DP guarantees, including composition and Gaussian DP formulations. The article surveys exact and approximate accounting methods for privacy loss under multiple queries, such as FFT-based, cumulant generating function, and saddle-point approaches, and discusses optimal noise distributions (e.g., Cactus, Schrödinger mechanisms) in large and small composition regimes. Finally, it addresses practical implications for ML (e.g., DP-SGD, subsampling amplification), synthetic data, and the challenge of choosing interpretable privacy parameters, highlighting opportunities for further information-theoretic insights to improve privacy guarantees and utility.

Abstract

Since being proposed in 2006, differential privacy has become a standard method for quantifying certain risks in publishing or sharing analyses of sensitive data. At its heart, differential privacy measures risk in terms of the differences between probability distributions, which is a central topic in information theory. A differentially private algorithm is a channel between the underlying data and the output of the analysis. Seen in this way, the guarantees made by differential privacy can be understood in terms of properties of this channel. In this article we examine a few of the key connections between information theory and the formulation/application of differential privacy, giving an ``operational significance'' for relevant information measures.

Paper Structure

This paper contains 4 sections, 4 equations, 2 figures.

Figures (2)

  • Figure 1: Guessing a single secret bit from leaked information.
  • Figure 2: Error tradeoffs for additive Gaussian and Laplace noise. For the Gaussian (left), $Y \sim \mathcal{N}(0,\sigma^2)$ when $s = 0$ and $Y \sim \mathcal{N}(1,\sigma^2)$ for $s = 1$. As $\sigma$ increases the error tradeoffs become closer and closer to the diagonal: more noise means more privacy since Aditya's test is harder. For the Laplace (right), $Y$ has the distribution $\frac{\lambda}{2} \exp(- \lambda | y - s |)$. As $\lambda$ increases the distribution becomes more concentrated and offers less privacy. The shapes of the tradeoffs differ depending on the distribution the noise.