Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

Christian Janos Lebeda; Jakub Tětek

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

Christian Janos Lebeda, Jakub Tětek

TL;DR

This work develops a practical, memory-efficient method for privately releasing approximate histograms and heavy hitters from data streams by privatizing the Misra-Gries sketch (PMG). The core idea adds per-counter Laplace noise plus a shared global Laplace draw and employs a threshold to hide key-set differences, achieving $(\varepsilon,\delta)$-DP with a maximum error of $n/(k+1) + O(\log(1/\delta)/\varepsilon)$ and space $2k$, closely matching private non-streaming optima up to constants. The paper also extends to pure DP via a small-sensitivity post-processing step, analyzes merging of sketches under both trusted and untrusted aggregators, and introduces the PAMG sketch for user-level DP with improved noise scaling via Gaussian mechanisms. Additionally, it outlines a Gaussian-based approach (GSHM) for scenarios where users contribute multiple distinct elements, reducing noise to $O(\sqrt{m})$ under $(\varepsilon,\delta)$-DP and highlights open problems for achieving similar gains under pure DP. The results collectively offer a practical, scalable toolkit for privacy-preserving streaming histograms and heavy hitters with theoretically near-optimal error bounds.

Abstract

We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under $(\varepsilon,δ)$-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for $\varepsilon$-differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under $(\varepsilon, δ)$-differential privacy.

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

TL;DR

-DP with a maximum error of

and space

, closely matching private non-streaming optima up to constants. The paper also extends to pure DP via a small-sensitivity post-processing step, analyzes merging of sketches under both trusted and untrusted aggregators, and introduces the PAMG sketch for user-level DP with improved noise scaling via Gaussian mechanisms. Additionally, it outlines a Gaussian-based approach (GSHM) for scenarios where users contribute multiple distinct elements, reducing noise to

under

-DP and highlights open problems for achieving similar gains under pure DP. The results collectively offer a practical, scalable toolkit for privacy-preserving streaming histograms and heavy hitters with theoretically near-optimal error bounds.

Abstract

-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for

-differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under

-differential privacy.

Paper Structure (11 sections, 25 theorems, 26 equations, 4 figures, 4 algorithms)

This paper contains 11 sections, 25 theorems, 26 equations, 4 figures, 4 algorithms.

Introduction
Technical overview
Preliminaries
Related work
Differentially Private Misra-Gries Sketch
Privatizing standard versions of the Misra-Gries sketch
Tips for practitioners
Pure Differential Privacy
Privatizing merged sketches
User-level Differential Privacy
Open Problem

Key Result

Theorem 1

The above algorithm is $(\varepsilon,\delta)$-differentially private, uses $2k$ words of space, and returns a frequency oracle $\hat{f}$ with maximum error of $n/(k+1) + O(\log (1/\delta)/\varepsilon)$ with high probability for $\delta$ being sufficiently small.

Figures (4)

Figure : Misra-Gries (MG)
Figure : Private Misra-Gries (PMG)
Figure : Misra-Gries Sketch Sensitivity Reduction
Figure : Privacy-Aware Misra-Gries ($\mathrm{PAMG}$)

Theorems & Definitions (49)

Theorem 1: Theorem \ref{['thm:final_theorem']} simplified
Theorem 2: Lemmas \ref{['lem:l2sens-MG-sketch']}, \ref{['lem:user-level-sketch-error']} and \ref{['lem:user-level-sketch-sensitivity']} summarized
Definition 3: Neighboring Streams
Definition 4: Differential Privacy DworkRothBook
Definition 5: Laplace distribution
Definition 6: $\ell_p$-sensitivity
Lemma 8
proof
Lemma 9
proof
...and 39 more

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

TL;DR

Abstract

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (49)