DPI: Ensuring Strict Differential Privacy for Infinite Data Streaming
Shuya Feng, Meisam Mohammady, Han Wang, Xiaochen Li, Zhan Qin, Yuan Hong
TL;DR
DPI addresses the challenge of protecting user-level privacy in infinite data streams by bounding cumulative privacy loss while maintaining data utility. It combines sensitivity compression via a PDF representation, a 0-DP synopsis pool, and a bi-directional DPI boosting mechanism with a Random Budget Allocation scheme to ensure a converging privacy budget. The framework provides formal privacy guarantees with a logarithmic growth of the total privacy budget and derives an optimal boosting strategy to maximize utility under DP constraints. Extensive experiments on real and synthetic streams demonstrate DPI’s strong privacy protection and high utility across applications such as statistical queries, anomaly detection, and recommender systems. This work enables practical, privacy-preserving real-time analytics for unbounded data streams.
Abstract
Streaming data, crucial for applications like crowdsourcing analytics, behavior studies, and real-time monitoring, faces significant privacy risks due to the large and diverse data linked to individuals. In particular, recent efforts to release data streams, using the rigorous privacy notion of differential privacy (DP), have encountered issues with unbounded privacy leakage. This challenge limits their applicability to only a finite number of time slots (''finite data stream'') or relaxation to protecting the events (''event or $w$-event DP'') rather than all the records of users. A persistent challenge is managing the sensitivity of outputs to inputs in situations where users contribute many activities and data distributions evolve over time. In this paper, we present a novel technique for Differentially Private data streaming over Infinite disclosure (DPI) that effectively bounds the total privacy leakage of each user in infinite data streams while enabling accurate data collection and analysis. Furthermore, we also maximize the accuracy of DPI via a novel boosting mechanism. Finally, extensive experiments across various streaming applications and real datasets (e.g., COVID-19, Network Traffic, and USDA Production), show that DPI maintains high utility for infinite data streams in diverse settings. Code for DPI is available at https://github.com/ShuyaFeng/DPI.
