Table of Contents
Fetching ...

A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem

Weijie J. Su

TL;DR

This review argues that differential privacy can be considered a pure statistical concept, and defines f-differential privacy, which extends other differential privacy definitions through a representation theorem, and reviews techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning.

Abstract

Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Despite originating in the cryptographic context, in this review paper we argue that, fundamentally, differential privacy can be considered a \textit{pure} statistical concept. By leveraging David Blackwell's informativeness theorem, our focus is to demonstrate based on prior work that all definitions of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of $f$-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render $f$-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and U.S.\ Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared to existing alternatives.

A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem

TL;DR

This review argues that differential privacy can be considered a pure statistical concept, and defines f-differential privacy, which extends other differential privacy definitions through a representation theorem, and reviews techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning.

Abstract

Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Despite originating in the cryptographic context, in this review paper we argue that, fundamentally, differential privacy can be considered a \textit{pure} statistical concept. By leveraging David Blackwell's informativeness theorem, our focus is to demonstrate based on prior work that all definitions of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of -differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render -differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and U.S.\ Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared to existing alternatives.
Paper Structure (16 sections, 5 theorems, 37 equations, 2 figures, 3 tables)

This paper contains 16 sections, 5 theorems, 37 equations, 2 figures, 3 tables.

Key Result

Theorem 3

Under Axioms axiom:ht and axim, any differential privacy definition must have its metric $D$ depend on the probability distributions through the trade-off function. That is, there must exist a link function $d$ defined on trade-off functions such that

Figures (2)

  • Figure 1: The graph shows three different trade-off functions for $T(M(S), M(S'))$. Among these, only the dashed line represents a trade-off function that satisfies $f$-differential privacy. Adapted from Figure 2 in dong2022gaussian.
  • Figure 2: Comparison between different privacy accountants in terms of the privacy parameter $\epsilon$ with $\delta = 10^{-5}$ in private federated analytics. GDP, Edgeworth, and RDP correspond to the privacy bounds obtained from bu2020deep, wang2022analytical, and wang2018subsampled, respectively. FFT- and FFT+ denote the lower and upper bounds derived by numerical methods koskela2020computinggopi2021numerical, which sandwich the Edgeworth bound tightly. For experimental details, see wang2022analytical.

Theorems & Definitions (7)

  • Theorem 3: Representation Theorem
  • Remark 2.1
  • Theorem 4: blackwell1950comparison
  • Proposition 2.2: concentrated2dong2022gaussian
  • Definition 3.1: dong2022gaussian
  • Theorem 5: wasserman_zhouKOVdong2022gaussian
  • Proposition 3.2: wang2022analytical