Table of Contents
Fetching ...

DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

Yiping Wang, Yanhao Wang, Cen Chen

TL;DR

DPSW-Sketch is proposed, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t. w.

Abstract

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgently needed to provide a provable guarantee of privacy. In this paper, we focus on the two fundamental problems of privately (1) estimating the frequency of an arbitrary item and (2) identifying the most frequent items (i.e., \emph{heavy hitters}), in the sliding window model. We propose \textsc{DPSW-Sketch}, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t.~$w$. Extensive experiments on five real-world and synthetic datasets show that \textsc{DPSW-Sketch} provides significantly better utility-privacy trade-offs than state-of-the-art methods.

DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

TL;DR

DPSW-Sketch is proposed, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t. w.

Abstract

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgently needed to provide a provable guarantee of privacy. In this paper, we focus on the two fundamental problems of privately (1) estimating the frequency of an arbitrary item and (2) identifying the most frequent items (i.e., \emph{heavy hitters}), in the sliding window model. We propose \textsc{DPSW-Sketch}, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t.~. Extensive experiments on five real-world and synthetic datasets show that \textsc{DPSW-Sketch} provides significantly better utility-privacy trade-offs than state-of-the-art methods.
Paper Structure (32 sections, 7 theorems, 1 equation, 14 figures, 2 tables, 3 algorithms)

This paper contains 32 sections, 7 theorems, 1 equation, 14 figures, 2 tables, 3 algorithms.

Key Result

theorem 1

Given an $(\alpha_1, \alpha_2)$-smooth function $g$ for two parameters $0 < \alpha_2 \leq \alpha_1 < 1$, suppose that there is an insert-only streaming algorithm $\mathcal{A}$ that produces a $\gamma$-approximation of $g$ using space $\mathcal{S}$ and update time $\mathcal{T}$. Then, there exists a

Figures (14)

  • Figure 1: Overview of the DPSW-Sketch framework.
  • Figure 2: Performance for frequency queries on high-frequency items by varying privacy parameter $\varepsilon \in \{0.1, 0.2, 0.4, \dots, 2.0\}$.
  • Figure 3: Performance for frequency queries on low-frequency items by varying privacy parameter $\varepsilon \in \{0.1, 0.2, 0.4, \dots, 2.0\}$.
  • Figure 4: Performance for heavy-hitter queries by varying privacy parameter $\varepsilon$ and threshold $\gamma$.
  • Figure 5: Performance of different sketches for frequency and heavy-hitter queries by varying window size $w$.
  • ...and 9 more figures

Theorems & Definitions (9)

  • theorem 1
  • theorem 2: zCDP $\Rightarrow$ DP
  • theorem 3
  • definition 1: $(\xi, \eta)$-Approximate Frequency
  • definition 2: $(\xi, \eta)$-Approximate $\gamma$-Heavy Hitters CormodeM05abs-2302-11081
  • lemma 1
  • lemma 2
  • lemma 3
  • theorem 4