Scalable Differentially Private Sketches under Continual Observation
Rayne Holland
TL;DR
The paper tackles privacy-preserving frequency estimation and heavy hitter detection in data streams under continual observation. It introduces LazySketch, a lazy-update framework that reduces per-update cost while maintaining differential privacy, complemented by a Gaussian Binary Mechanism for private counting. The approach improves throughput up to 250x over previous continual-observation methods and demonstrates solid utility on synthetic and real-world data. This work enables scalable, private streaming analytics without compromising real-time update performance.
Abstract
Linear sketches are fundamental tools in data stream analytics. They are notable for supporting both approximate frequency queries and heavy hitter detection with bounded trade-offs for error and memory. Importantly, on streams that contain sensitive information, linear sketches can be easily privatized with the injection of a suitable amount of noise. This process is efficient in the single release model, where the output is released only at the end of the stream. In this setting, it suffices to add noise to the sketch once. In contrast, in the continual observation model, where the output is released at every time-step, fresh noise needs to be added to the sketch before each release. This creates an additional computational overhead. To address this, we introduce Lazy Sketch, a novel differentially private sketching method that employs lazy updates, perturbing and modifying only a small portion of the sketch at each step. Compared to prior work, we reduce the update complexity by a factor of $O(w)$, where $w$ is the width of the sketch. Experiments demonstrate that our method increases throughput by up to 250x over prior work, making continual observation differential privacy practical for high-speed streaming applications.
