Multi-class Item Mining under Local Differential Privacy
Yulian Mao, Qingqing Ye, Rong Du, Qi Wang, Kai Huang, Haibo Hu
TL;DR
This paper tackles multi-class item mining under local differential privacy by preserving the link between class labels and items. It introduces two foundational frameworks, PTJ (joint perturbation) and PTS (perturbation of label and item separately), augmented by two optimization modules—validity perturbation (to handle invalid label–item pairs) and correlated perturbation (to maintain label–item relationships). The authors derive unbiased frequency estimators and tailor optimized perturbations for two queries: multi-class frequency estimation and multi-class top-$k$ item mining, with extensive theoretical analysis and experiments on real and synthetic datasets. Key findings show that validity perturbation reduces noise from invalid data and that correlated perturbation improves utility, especially at lower privacy budgets; multi-class methods significantly outperform naive strawman baselines like HEC, with PTJ excelling in some scenarios and PTS-CP delivering strong performance with lower communication costs. The work enables practical, privacy-preserving, class-specific item statistics for personalized recommendations and downstream learning tasks under LDP.
Abstract
Item mining, a fundamental task for collecting statistical data from users, has raised increasing privacy concerns. To address these concerns, local differential privacy (LDP) was proposed as a privacy-preserving technique. Existing LDP item mining mechanisms primarily concentrate on global statistics, i.e., those from the entire dataset. Nevertheless, they fall short of user-tailored tasks such as personalized recommendations, whereas classwise statistics can improve task accuracy with fine-grained information. Meanwhile, the introduction of class labels brings new challenges. Label perturbation may result in invalid items for aggregation. To this end, we propose frameworks for multi-class item mining, along with two mechanisms: validity perturbation to reduce the impact of invalid data, and correlated perturbation to preserve the relationship between labels and items. We also apply these optimized methods to two multi-class item mining queries: frequency estimation and top-$k$ item mining. Through theoretical analysis and extensive experiments, we verify the effectiveness and superiority of these methods.
