Privacy for Free: Leveraging Local Differential Privacy Perturbed Data from Multiple Services
Rong Du, Qingqing Ye, Yue Fu, Haibo Hu
TL;DR
This work tackles privacy challenges when multiple services collect numerical data under Local Differential Privacy by enabling aggregation of perturbed results from different services. It introduces three methods: Unbiased Averaging (UA) and User-level Weighted Averaging (UWA) for mean estimation, and User-level Likelihood Estimation (ULE) for distribution estimation, all designed to be agnostic to perturbation mechanisms and budgets. UA converts biased perturbed values into unbiased estimates, UWA uses Bayesian-informed posterior variances to weight contributions, and ULE employs an EM-based maximum likelihood framework to recover the original distribution across users. Experiments on synthetic and real datasets show substantial improvements over single-service baselines, demonstrating the practical potential of cross-service privacy-preserving data fusion for accurate statistics while preserving user privacy.
Abstract
Local Differential Privacy (LDP) has emerged as a widely adopted privacy-preserving technique in modern data analytics, enabling users to share statistical insights while maintaining robust privacy guarantees. However, current LDP applications assume a single service gathering perturbed information from users. In reality, multiple services may be interested in collecting users' data, which poses privacy burdens to users as more such services emerge. To address this issue, this paper proposes a framework for collecting and aggregating data based on perturbed information from multiple services, regardless of their estimated statistics (e.g., mean or distribution) and perturbation mechanisms. Then for mean estimation, we introduce the Unbiased Averaging (UA) method and its optimized version, User-level Weighted Averaging (UWA). The former utilizes biased perturbed data, while the latter assigns weights to different perturbed results based on perturbation information, thereby achieving minimal variance. For distribution estimation, we propose the User-level Likelihood Estimation (ULE), which treats all perturbed results from a user as a whole for maximum likelihood estimation. Experimental results demonstrate that our framework and constituting methods significantly improve the accuracy of both mean and distribution estimation.
