Table of Contents
Fetching ...

Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

Palak Jain, Iden Kalemaj, Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith

TL;DR

This paper analyzes counting distinct elements under continual differential privacy in the turnstile model, revealing a rich landscape where deletions cause high sensitivity and hard lower bounds. It introduces the maximum flippancy parameter $w_x$ to capture how often an element’s contribution to the distinct count flips, and provides an item-level private mechanism with additive error $O\left(\sqrt{w_x}\cdot\text{polylog }T\right)$ that adapts without prior knowledge of $w_x$. The authors prove tight (up to polylog factors) lower bounds for event-level DP in many regimes and establish a close match between upper and lower bounds for item-level DP, including a lower bound of at least $T^{1/4}$ in some settings. The work connects private continual counting to foundational DP lower bounds via reductions from InnerProducts and Marginals, and proposes a practical, adaptive strategy that remains polylogarithmic in $T$ when the maximum flippancy is small, offering insights into privacy-preserving streaming analytics in dynamic datasets.

Abstract

Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic - the number of distinct items - in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream $T$. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T^{1/4}$ even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(\sqrt{w} \cdot poly\log T)$ additive error, without requiring prior knowledge of $w$. We prove that this is the best achievable error bound that depends only on $w$, for a large range of values of $w$. When $w$ is small, the error of our mechanism is similar to the polylogarithmic in $T$ error in the insertion-only setting, bypassing the hardness in the turnstile model.

Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

TL;DR

This paper analyzes counting distinct elements under continual differential privacy in the turnstile model, revealing a rich landscape where deletions cause high sensitivity and hard lower bounds. It introduces the maximum flippancy parameter to capture how often an element’s contribution to the distinct count flips, and provides an item-level private mechanism with additive error that adapts without prior knowledge of . The authors prove tight (up to polylog factors) lower bounds for event-level DP in many regimes and establish a close match between upper and lower bounds for item-level DP, including a lower bound of at least in some settings. The work connects private continual counting to foundational DP lower bounds via reductions from InnerProducts and Marginals, and proposes a practical, adaptive strategy that remains polylogarithmic in when the maximum flippancy is small, offering insights into privacy-preserving streaming analytics in dynamic datasets.

Abstract

Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic - the number of distinct items - in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream . We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy , continually outputs the number of distinct elements with an additive error, without requiring prior knowledge of . We prove that this is the best achievable error bound that depends only on , for a large range of values of . When is small, the error of our mechanism is similar to the polylogarithmic in error in the insertion-only setting, bypassing the hardness in the turnstile model.
Paper Structure (28 sections, 25 theorems, 37 equations, 1 table, 5 algorithms)

This paper contains 28 sections, 25 theorems, 37 equations, 1 table, 5 algorithms.

Key Result

Theorem 1.5

For all $\varepsilon,\delta \in (0,1]$ and sufficiently large $T \in \mathbb N$, there exists an $(\varepsilon, \delta)$-item-level-DP mechanism for $\mathsf{CountDistinct}$ that is $\alpha$-accurate for all turnstile streams $x$ of length $T$, where and $w_x$ is the maximum flippancy of the stream $x$.

Theorems & Definitions (62)

  • Definition 1.1: Existence vector, $\mathsf{CountDistinct}$
  • Definition 1.2: Error of an answer vector and error of a mechanism for $\mathsf{CountDistinct}$
  • Definition 1.3: Neighboring streams
  • Definition 1.4: Flippancy
  • Theorem 1.5: Upper bound
  • Theorem 1.6: Event-level lower bound
  • Theorem 1.7: Item-level lower bound
  • Definition 2.1: $(\varepsilon, \delta)$-indistinguishability
  • Definition 2.2: $\ell$-Neighboring streams
  • Lemma 2.3: Group privacy DworkMNS16
  • ...and 52 more