Table of Contents
Fetching ...

Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy

Rong Du, Qingqing Ye, Yaxin Xiao, Liantong Yu, Yue Fu, Haibo Hu

TL;DR

This work tackles publishing real-time streams under local differential privacy with a focus on $w$-event privacy. It introduces a dual-use perturbation paradigm and three algorithms—IPP, APP, and CAPP—that use perturbed results to calibrate subsequent perturbations, achieving improved utility while preserving $w$-event DP. The authors further extend the approach with time-slot sampling (PP-S) and demonstrate, through extensive experiments on four real datasets, that CAPP delivers the best overall utility for subsequence means and stream publication. The results suggest a practical, high-utility path for privacy-preserving stream analysis and reveal insights on clipping ranges and high-dimensional extensions, with broad applicability to crowd-level statistics and other LDP mechanisms.

Abstract

Stream data from real-time distributed systems such as IoT, tele-health, and crowdsourcing has become an important data source. However, the collection and analysis of user-generated stream data raise privacy concerns due to the potential exposure of sensitive information. To address these concerns, local differential privacy (LDP) has emerged as a promising standard. Nevertheless, applying LDP to stream data presents significant challenges, as stream data often involves a large or even infinite number of values. Allocating a given privacy budget across these data points would introduce overwhelming LDP noise to the original stream data. Beyond existing approaches that merely use perturbed values for estimating statistics, our design leverages them for both perturbation and estimation. This dual utilization arises from a key observation: each user knows their own ground truth and perturbed values, enabling a precise computation of the deviation error caused by perturbation. By incorporating this deviation into the perturbation process of subsequent values, the previous noise can be calibrated. Following this insight, we introduce the Iterative Perturbation Parameterization (IPP) method, which utilizes current perturbed results to calibrate the subsequent perturbation process. To enhance the robustness of calibration and reduce sensitivity, two algorithms, namely Accumulated Perturbation Parameterization (APP) and Clipped Accumulated Perturbation Parameterization (CAPP) are further developed. We prove that these three algorithms satisfy $w$-event differential privacy while significantly improving utility. Experimental results demonstrate that our techniques outperform state-of-the-art LDP stream publishing solutions in terms of utility, while retaining the same privacy guarantee.

Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy

TL;DR

This work tackles publishing real-time streams under local differential privacy with a focus on -event privacy. It introduces a dual-use perturbation paradigm and three algorithms—IPP, APP, and CAPP—that use perturbed results to calibrate subsequent perturbations, achieving improved utility while preserving -event DP. The authors further extend the approach with time-slot sampling (PP-S) and demonstrate, through extensive experiments on four real datasets, that CAPP delivers the best overall utility for subsequence means and stream publication. The results suggest a practical, high-utility path for privacy-preserving stream analysis and reveal insights on clipping ranges and high-dimensional extensions, with broad applicability to crowd-level statistics and other LDP mechanisms.

Abstract

Stream data from real-time distributed systems such as IoT, tele-health, and crowdsourcing has become an important data source. However, the collection and analysis of user-generated stream data raise privacy concerns due to the potential exposure of sensitive information. To address these concerns, local differential privacy (LDP) has emerged as a promising standard. Nevertheless, applying LDP to stream data presents significant challenges, as stream data often involves a large or even infinite number of values. Allocating a given privacy budget across these data points would introduce overwhelming LDP noise to the original stream data. Beyond existing approaches that merely use perturbed values for estimating statistics, our design leverages them for both perturbation and estimation. This dual utilization arises from a key observation: each user knows their own ground truth and perturbed values, enabling a precise computation of the deviation error caused by perturbation. By incorporating this deviation into the perturbation process of subsequent values, the previous noise can be calibrated. Following this insight, we introduce the Iterative Perturbation Parameterization (IPP) method, which utilizes current perturbed results to calibrate the subsequent perturbation process. To enhance the robustness of calibration and reduce sensitivity, two algorithms, namely Accumulated Perturbation Parameterization (APP) and Clipped Accumulated Perturbation Parameterization (CAPP) are further developed. We prove that these three algorithms satisfy -event differential privacy while significantly improving utility. Experimental results demonstrate that our techniques outperform state-of-the-art LDP stream publishing solutions in terms of utility, while retaining the same privacy guarantee.

Paper Structure

This paper contains 32 sections, 6 theorems, 78 equations, 11 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

(Sequential Composition) For any $k$ mechanisms providing $\epsilon_i$-local differential privacy for each, the sequence of all these mechanisms provides $\epsilon_{\text{seq}}$-local differential privacy, where

Figures (11)

  • Figure 1: Illustration of the data streams collection framework
  • Figure 2: The procedure of IPP
  • Figure 3: Illustration of the sampling
  • Figure 4: MSE comparison w.r.t. $\epsilon$ for perturbation parameterization based algorithms vs SW-direct
  • Figure 5: Cosine distance comparison w.r.t. $\epsilon$ for perturbation parameterization algorithms vs SW-direct
  • ...and 6 more figures

Theorems & Definitions (17)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Definition 2
  • Definition 3
  • proof
  • proof
  • Theorem 3
  • proof
  • proof
  • ...and 7 more