Table of Contents
Fetching ...

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Ryan Rogers

TL;DR

The paper addresses privacy-preserving release of histograms when the domain is unknown, a common practical challenge in SQL-like queries. It introduces a unified framework based on approximate concentrated differential privacy (CDP) and a three-part lemma to analyze diverse unknown-domain algorithms through differentiating events, a coupling mechanism $A'$, and pure CDP. The framework is then applied to multiple algorithm families, including Positive Count Histograms, Top-$(\bar{k}+1)$ Histograms, Exponential Mechanism variants, Pay-what-you-get compositions, and Continual Observation with the Binary Mechanism, yielding tighter CDP guarantees and simplifying composition. This approach facilitates safer integration of private histogram algorithms into existing data systems by providing robust, composable privacy guarantees with explicit noise and threshold choices. Overall, the work demonstrates how CDP-based analyses can unify and improve privacy bounds across a spectrum of unknown-domain histogram techniques with practical implications for data analytics pipelines.

Abstract

There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke'16, which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

TL;DR

The paper addresses privacy-preserving release of histograms when the domain is unknown, a common practical challenge in SQL-like queries. It introduces a unified framework based on approximate concentrated differential privacy (CDP) and a three-part lemma to analyze diverse unknown-domain algorithms through differentiating events, a coupling mechanism , and pure CDP. The framework is then applied to multiple algorithm families, including Positive Count Histograms, Top- Histograms, Exponential Mechanism variants, Pay-what-you-get compositions, and Continual Observation with the Binary Mechanism, yielding tighter CDP guarantees and simplifying composition. This approach facilitates safer integration of private histogram algorithms into existing data systems by providing robust, composable privacy guarantees with explicit noise and threshold choices. Overall, the work demonstrates how CDP-based analyses can unify and improve privacy bounds across a spectrum of unknown-domain histogram techniques with practical implications for data analytics pipelines.

Abstract

There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke'16, which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.
Paper Structure (11 sections, 13 theorems, 20 equations, 4 algorithms)

This paper contains 11 sections, 13 theorems, 20 equations, 4 algorithms.

Key Result

Theorem 1

Let $f: \mathcal{X} \to \mathbb{R}^d$ have $\ell_1$-sensitivity $\Delta_1(f)$, then the mechanism $M: \mathcal{X} \to \mathbb{R}^d$ where $M(x) = f(x) + (Z_1, \cdots, Z_d)$ with $\{Z_i\}\stackrel{i.i.d.}{\sim} \mathrm{Lap}(\Delta_1(f)/\varepsilon)$ is $\varepsilon$-DP for $\varepsilon>0$.

Theorems & Definitions (21)

  • Definition 2.1: DworkMcNiSm06DworkKeMcMiNa06
  • Theorem 1: DworkMcNiSm06
  • Theorem 2: McSherryTa07
  • Definition 2.2: BunSt16PapernotSt22
  • Theorem 3: BunSt16
  • Theorem 4: BunSt16
  • Theorem 5: CesarRo20
  • Theorem 6
  • Lemma 3.1
  • proof
  • ...and 11 more