Table of Contents
Fetching ...

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

Christian Janos Lebeda, Jakub Tětek

TL;DR

This work develops a practical, memory-efficient method for privately releasing approximate histograms and heavy hitters from data streams by privatizing the Misra-Gries sketch (PMG). The core idea adds per-counter Laplace noise plus a shared global Laplace draw and employs a threshold to hide key-set differences, achieving $(\varepsilon,\delta)$-DP with a maximum error of $n/(k+1) + O(\log(1/\delta)/\varepsilon)$ and space $2k$, closely matching private non-streaming optima up to constants. The paper also extends to pure DP via a small-sensitivity post-processing step, analyzes merging of sketches under both trusted and untrusted aggregators, and introduces the PAMG sketch for user-level DP with improved noise scaling via Gaussian mechanisms. Additionally, it outlines a Gaussian-based approach (GSHM) for scenarios where users contribute multiple distinct elements, reducing noise to $O(\sqrt{m})$ under $(\varepsilon,\delta)$-DP and highlights open problems for achieving similar gains under pure DP. The results collectively offer a practical, scalable toolkit for privacy-preserving streaming histograms and heavy hitters with theoretically near-optimal error bounds.

Abstract

We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under $(\varepsilon,δ)$-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for $\varepsilon$-differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under $(\varepsilon, δ)$-differential privacy.

Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

TL;DR

This work develops a practical, memory-efficient method for privately releasing approximate histograms and heavy hitters from data streams by privatizing the Misra-Gries sketch (PMG). The core idea adds per-counter Laplace noise plus a shared global Laplace draw and employs a threshold to hide key-set differences, achieving -DP with a maximum error of and space , closely matching private non-streaming optima up to constants. The paper also extends to pure DP via a small-sensitivity post-processing step, analyzes merging of sketches under both trusted and untrusted aggregators, and introduces the PAMG sketch for user-level DP with improved noise scaling via Gaussian mechanisms. Additionally, it outlines a Gaussian-based approach (GSHM) for scenarios where users contribute multiple distinct elements, reducing noise to under -DP and highlights open problems for achieving similar gains under pure DP. The results collectively offer a practical, scalable toolkit for privacy-preserving streaming histograms and heavy hitters with theoretically near-optimal error bounds.

Abstract

We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under -differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for -differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under -differential privacy.
Paper Structure (11 sections, 25 theorems, 26 equations, 4 figures, 4 algorithms)

This paper contains 11 sections, 25 theorems, 26 equations, 4 figures, 4 algorithms.

Key Result

Theorem 1

The above algorithm is $(\varepsilon,\delta)$-differentially private, uses $2k$ words of space, and returns a frequency oracle $\hat{f}$ with maximum error of $n/(k+1) + O(\log (1/\delta)/\varepsilon)$ with high probability for $\delta$ being sufficiently small.

Figures (4)

  • Figure : Misra-Gries (MG)
  • Figure : Private Misra-Gries (PMG)
  • Figure : Misra-Gries Sketch Sensitivity Reduction
  • Figure : Privacy-Aware Misra-Gries ($\mathrm{PAMG}$)

Theorems & Definitions (49)

  • Theorem 1: Theorem \ref{['thm:final_theorem']} simplified
  • Theorem 2: Lemmas \ref{['lem:l2sens-MG-sketch']}, \ref{['lem:user-level-sketch-error']} and \ref{['lem:user-level-sketch-sensitivity']} summarized
  • Definition 3: Neighboring Streams
  • Definition 4: Differential Privacy DworkRothBook
  • Definition 5: Laplace distribution
  • Definition 6: $\ell_p$-sensitivity
  • Lemma 8
  • proof
  • Lemma 9
  • proof
  • ...and 39 more