Table of Contents
Fetching ...

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

Rachel Cummings, Alessandro Epasto, Jieming Mao, Tamalika Mukherjee, Tingting Ou, Peilin Zhong

TL;DR

This paper delivers the first sublinear-space differentially private algorithms for counting distinct elements in the turnstile streaming model with continual releases, addressing a gap identified in prior work. The core method combines a KSET-based distinct sample, a Gaussian-noised binary mechanism for private accumulation, and a blocklisting scheme to handle high-occurrence items, yielding a $(1+\eta)$-multiplicative approximation with additive error $\tilde{O}_\eta(T^{1/3})$ in general, and $\tilde{O}_\eta(\sqrt{W})$-space with $\tilde{O}_\eta(\sqrt{W})$ additive error when an occurrency bound $W$ is known. The analysis provides unconditional DP guarantees and a tight space lower bound of $\tilde{\Omega}(T^{1/3})$ under the employed technique, while also offering a near-optimal blocklisting lower bound. These results address an open problem and substantially advance private, space-efficient continual-release computation for dynamic streams, with potential impact on real-time analytics under privacy constraints.

Abstract

The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|U|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|U|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $\tilde{O}_η(T^{1/3})$ space and additive error, and a $(1+η)$-relative approximation for all $η\in (0,1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $Ω(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $\tilde{O}_η(\sqrt{W})$ additive error, using $\tilde{O}_η(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by [Jain, Kalemaj, Raskhodnikova, Sivakumar, Smith, Neurips23] about designing low-memory mechanisms for this problem. We complement these results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\tildeΩ(T^{1/3})$ on arbitrary streams.

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

TL;DR

This paper delivers the first sublinear-space differentially private algorithms for counting distinct elements in the turnstile streaming model with continual releases, addressing a gap identified in prior work. The core method combines a KSET-based distinct sample, a Gaussian-noised binary mechanism for private accumulation, and a blocklisting scheme to handle high-occurrence items, yielding a -multiplicative approximation with additive error in general, and -space with additive error when an occurrency bound is known. The analysis provides unconditional DP guarantees and a tight space lower bound of under the employed technique, while also offering a near-optimal blocklisting lower bound. These results address an open problem and substantially advance private, space-efficient continual-release computation for dynamic streams, with potential impact on real-time analytics under privacy constraints.

Abstract

The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream and the size of the universe from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both and . In this paper, we give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, space and additive error, and a -relative approximation for all . Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known additive error lower bound for arbitrary streams. Moreover, when a bound on the number of times an item appears in the stream is known, our algorithm provides additive error, using space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by [Jain, Kalemaj, Raskhodnikova, Sivakumar, Smith, Neurips23] about designing low-memory mechanisms for this problem. We complement these results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space on arbitrary streams.

Paper Structure

This paper contains 33 sections, 33 theorems, 33 equations, 6 algorithms.

Key Result

Theorem 1

For all $\varepsilon, \eta>0$ and $\delta \in (0,1)$ and streams of length $T \in \mathbb{N}$, given a promised occurrency bound of $W_x$, there exists an $(\varepsilon,\delta)$-DP algorithm in the turnstile model under continual release that outputs a $(1 + \eta, \Tilde{O}_{\varepsilon,\delta,\eta}

Theorems & Definitions (62)

  • Definition 1: Occurrency
  • Theorem 1: Informal, \ref{['thm:cd-cond-inf']}
  • Theorem 2: Informal, Corollary \ref{['corol:blocklist-ub']} and Theorem \ref{['thm:lb']}
  • Theorem 3: Main (Informal), \ref{['thm:cd-uncond-inf']}
  • Definition 2: Differential privacy dwork2006calibrating
  • Definition 3: zero-concentrated differential privacy (zCDP) BunS16
  • Definition 4: Approximate zCDP BunS16
  • Theorem 4: Composition BunS16
  • Theorem 5: Relationship to DP BunS16
  • Definition 5: Sensitivity dwork2006calibrating
  • ...and 52 more