Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch
Christian Janos Lebeda, Jakub Tětek
TL;DR
This work develops a practical, memory-efficient method for privately releasing approximate histograms and heavy hitters from data streams by privatizing the Misra-Gries sketch (PMG). The core idea adds per-counter Laplace noise plus a shared global Laplace draw and employs a threshold to hide key-set differences, achieving $(\varepsilon,\delta)$-DP with a maximum error of $n/(k+1) + O(\log(1/\delta)/\varepsilon)$ and space $2k$, closely matching private non-streaming optima up to constants. The paper also extends to pure DP via a small-sensitivity post-processing step, analyzes merging of sketches under both trusted and untrusted aggregators, and introduces the PAMG sketch for user-level DP with improved noise scaling via Gaussian mechanisms. Additionally, it outlines a Gaussian-based approach (GSHM) for scenarios where users contribute multiple distinct elements, reducing noise to $O(\sqrt{m})$ under $(\varepsilon,\delta)$-DP and highlights open problems for achieving similar gains under pure DP. The results collectively offer a practical, scalable toolkit for privacy-preserving streaming histograms and heavy hitters with theoretically near-optimal error bounds.
Abstract
We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under $(\varepsilon,δ)$-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. We also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for $\varepsilon$-differential privacy where the noise scales linearly to the size of the sketch. Finally, we consider a general setting where users can contribute multiple distinct elements. We present a new sketch with maximum error matching the Misra-Gries sketch. For many parameters in this setting our sketch can be released with less noise under $(\varepsilon, δ)$-differential privacy.
