Table of Contents
Fetching ...

Multi-class Item Mining under Local Differential Privacy

Yulian Mao, Qingqing Ye, Rong Du, Qi Wang, Kai Huang, Haibo Hu

TL;DR

This paper tackles multi-class item mining under local differential privacy by preserving the link between class labels and items. It introduces two foundational frameworks, PTJ (joint perturbation) and PTS (perturbation of label and item separately), augmented by two optimization modules—validity perturbation (to handle invalid label–item pairs) and correlated perturbation (to maintain label–item relationships). The authors derive unbiased frequency estimators and tailor optimized perturbations for two queries: multi-class frequency estimation and multi-class top-$k$ item mining, with extensive theoretical analysis and experiments on real and synthetic datasets. Key findings show that validity perturbation reduces noise from invalid data and that correlated perturbation improves utility, especially at lower privacy budgets; multi-class methods significantly outperform naive strawman baselines like HEC, with PTJ excelling in some scenarios and PTS-CP delivering strong performance with lower communication costs. The work enables practical, privacy-preserving, class-specific item statistics for personalized recommendations and downstream learning tasks under LDP.

Abstract

Item mining, a fundamental task for collecting statistical data from users, has raised increasing privacy concerns. To address these concerns, local differential privacy (LDP) was proposed as a privacy-preserving technique. Existing LDP item mining mechanisms primarily concentrate on global statistics, i.e., those from the entire dataset. Nevertheless, they fall short of user-tailored tasks such as personalized recommendations, whereas classwise statistics can improve task accuracy with fine-grained information. Meanwhile, the introduction of class labels brings new challenges. Label perturbation may result in invalid items for aggregation. To this end, we propose frameworks for multi-class item mining, along with two mechanisms: validity perturbation to reduce the impact of invalid data, and correlated perturbation to preserve the relationship between labels and items. We also apply these optimized methods to two multi-class item mining queries: frequency estimation and top-$k$ item mining. Through theoretical analysis and extensive experiments, we verify the effectiveness and superiority of these methods.

Multi-class Item Mining under Local Differential Privacy

TL;DR

This paper tackles multi-class item mining under local differential privacy by preserving the link between class labels and items. It introduces two foundational frameworks, PTJ (joint perturbation) and PTS (perturbation of label and item separately), augmented by two optimization modules—validity perturbation (to handle invalid label–item pairs) and correlated perturbation (to maintain label–item relationships). The authors derive unbiased frequency estimators and tailor optimized perturbations for two queries: multi-class frequency estimation and multi-class top- item mining, with extensive theoretical analysis and experiments on real and synthetic datasets. Key findings show that validity perturbation reduces noise from invalid data and that correlated perturbation improves utility, especially at lower privacy budgets; multi-class methods significantly outperform naive strawman baselines like HEC, with PTJ excelling in some scenarios and PTS-CP delivering strong performance with lower communication costs. The work enables practical, privacy-preserving, class-specific item statistics for personalized recommendations and downstream learning tasks under LDP.

Abstract

Item mining, a fundamental task for collecting statistical data from users, has raised increasing privacy concerns. To address these concerns, local differential privacy (LDP) was proposed as a privacy-preserving technique. Existing LDP item mining mechanisms primarily concentrate on global statistics, i.e., those from the entire dataset. Nevertheless, they fall short of user-tailored tasks such as personalized recommendations, whereas classwise statistics can improve task accuracy with fine-grained information. Meanwhile, the introduction of class labels brings new challenges. Label perturbation may result in invalid items for aggregation. To this end, we propose frameworks for multi-class item mining, along with two mechanisms: validity perturbation to reduce the impact of invalid data, and correlated perturbation to preserve the relationship between labels and items. We also apply these optimized methods to two multi-class item mining queries: frequency estimation and top- item mining. Through theoretical analysis and extensive experiments, we verify the effectiveness and superiority of these methods.

Paper Structure

This paper contains 28 sections, 10 theorems, 17 equations, 11 figures, 3 tables.

Key Result

Theorem 1

(Privacy of Validity Perturbation Mechanism) The validity perturbation mechanism satisfies $\epsilon$-LDP for $\epsilon=\ln \frac{p(1-q)}{(1-p)q}$wang2017locally.

Figures (11)

  • Figure 1: The overview illustrates two frameworks for multi-class item mining. The first framework, referred to as PTJ, treats a label-item pair as a whole and perturbs it to another pair. The second framework, PTS, perturbs each element separately.
  • Figure 2: An illustration for encoding scheme. For a valid item, the encoded bits from UE are padded with a validity flag "0". Conversely, for an invalid item, all encoded bits are set to "0", and the validity flag is set to "1".
  • Figure 4: Each user receives random seeds and bucket states to generate a current shuffled result before perturbing her label-item pair. The choice of perturbation mechanism depends on the aggregation goal described in Algorithms \ref{['gfim']} and \ref{['cstim']}.
  • Figure 5: Empirical variance analysis.
  • Figure 6: RMSE from two real-world datasets with varying privacy budget $\epsilon$.
  • ...and 6 more figures

Theorems & Definitions (14)

  • Definition 1: Differential Privacy dwork2006differential
  • Definition 2: Local Differential Privacy kasiviswanathan2011canye2020local
  • Definition 3: Multi-class Frequency Estimation
  • Definition 4: Multi-class Top-$k$ Item Mining
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • ...and 4 more