Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
Rachel Cummings, Alessandro Epasto, Jieming Mao, Tamalika Mukherjee, Tingting Ou, Peilin Zhong
TL;DR
This paper delivers the first sublinear-space differentially private algorithms for counting distinct elements in the turnstile streaming model with continual releases, addressing a gap identified in prior work. The core method combines a KSET-based distinct sample, a Gaussian-noised binary mechanism for private accumulation, and a blocklisting scheme to handle high-occurrence items, yielding a $(1+\eta)$-multiplicative approximation with additive error $\tilde{O}_\eta(T^{1/3})$ in general, and $\tilde{O}_\eta(\sqrt{W})$-space with $\tilde{O}_\eta(\sqrt{W})$ additive error when an occurrency bound $W$ is known. The analysis provides unconditional DP guarantees and a tight space lower bound of $\tilde{\Omega}(T^{1/3})$ under the employed technique, while also offering a near-optimal blocklisting lower bound. These results address an open problem and substantially advance private, space-efficient continual-release computation for dynamic streams, with potential impact on real-time analytics under privacy constraints.
Abstract
The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|U|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|U|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $\tilde{O}_η(T^{1/3})$ space and additive error, and a $(1+η)$-relative approximation for all $η\in (0,1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $Ω(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $\tilde{O}_η(\sqrt{W})$ additive error, using $\tilde{O}_η(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by [Jain, Kalemaj, Raskhodnikova, Sivakumar, Smith, Neurips23] about designing low-memory mechanisms for this problem. We complement these results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\tildeΩ(T^{1/3})$ on arbitrary streams.
