A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Ryan Rogers

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Ryan Rogers

TL;DR

The paper addresses privacy-preserving release of histograms when the domain is unknown, a common practical challenge in SQL-like queries. It introduces a unified framework based on approximate concentrated differential privacy (CDP) and a three-part lemma to analyze diverse unknown-domain algorithms through differentiating events, a coupling mechanism $A'$, and pure CDP. The framework is then applied to multiple algorithm families, including Positive Count Histograms, Top-$(\bar{k}+1)$ Histograms, Exponential Mechanism variants, Pay-what-you-get compositions, and Continual Observation with the Binary Mechanism, yielding tighter CDP guarantees and simplifying composition. This approach facilitates safer integration of private histogram algorithms into existing data systems by providing robust, composable privacy guarantees with explicit noise and threshold choices. Overall, the work demonstrates how CDP-based analyses can unify and improve privacy bounds across a spectrum of unknown-domain histogram techniques with practical implications for data analytics pipelines.

Abstract

There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke'16, which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

TL;DR

, and pure CDP. The framework is then applied to multiple algorithm families, including Positive Count Histograms, Top-

Histograms, Exponential Mechanism variants, Pay-what-you-get compositions, and Continual Observation with the Binary Mechanism, yielding tighter CDP guarantees and simplifying composition. This approach facilitates safer integration of private histogram algorithms into existing data systems by providing robust, composable privacy guarantees with explicit noise and threshold choices. Overall, the work demonstrates how CDP-based analyses can unify and improve privacy bounds across a spectrum of unknown-domain histogram techniques with practical implications for data analytics pipelines.

Abstract

Paper Structure (11 sections, 13 theorems, 20 equations, 4 algorithms)

This paper contains 11 sections, 13 theorems, 20 equations, 4 algorithms.

Introduction
Preliminaries
Unifying Framework
Unknown Domain Algorithms
Positive Count Histograms
Top-$(\bar{k}+1)$ Count Histograms
Exponential Mechanism
Pay-what-you-get Composition
Continual Observation
Conclusion
Acknowledgements

Key Result

Theorem 1

Let $f: \mathcal{X} \to \mathbb{R}^d$ have $\ell_1$-sensitivity $\Delta_1(f)$, then the mechanism $M: \mathcal{X} \to \mathbb{R}^d$ where $M(x) = f(x) + (Z_1, \cdots, Z_d)$ with $\{Z_i\}\stackrel{i.i.d.}{\sim} \mathrm{Lap}(\Delta_1(f)/\varepsilon)$ is $\varepsilon$-DP for $\varepsilon>0$.

Theorems & Definitions (21)

Definition 2.1: DworkMcNiSm06DworkKeMcMiNa06
Theorem 1: DworkMcNiSm06
Theorem 2: McSherryTa07
Definition 2.2: BunSt16PapernotSt22
Theorem 3: BunSt16
Theorem 4: BunSt16
Theorem 5: CesarRo20
Theorem 6
Lemma 3.1
proof
...and 11 more

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

TL;DR

Abstract

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)