Table of Contents
Fetching ...

From Randomized Response to Randomized Index: Answering Subset Counting Queries with Local Differential Privacy

Qingqing Ye, Liantong Yu, Kai Huang, Xiaokui Xiao, Weiran Liu, Haibo Hu

TL;DR

This paper tackles subset counting queries under Local Differential Privacy for set-valued data by introducing CRI, an index-based counting mechanism that achieves index-level deniability without perturbing the values. To further boost utility and scalability, it extends CRI into CRIAD with augmented dummy bits and a flexible multi-dummy, multi-sample, multi-group design, accompanied by formal privacy guarantees, unbiasedness, and variance bounds. The authors provide rigorous analyses and demonstrate that CRIAD substantially outperforms traditional value-perturbation LDP methods on real-world datasets across various privacy budgets and category sizes. The work offers practical implications for private analytics of category-level statistics and lays the groundwork for scalable, privacy-preserving analytics in federated or heterogeneous settings.

Abstract

Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy. Existing perturbation mechanisms typically require perturbing the original values to ensure acceptable privacy, which inevitably results in value distortion and utility deterioration. In this work, we propose an alternative approach -- instead of perturbing values, we apply randomization to indexes of values while ensuring rigorous LDP guarantees. Inspired by the deniability of randomized indexes, we present CRIAD for answering subset counting queries on set-value data. By integrating a multi-dummy, multi-sample, and multi-group strategy, CRIAD serves as a fully scalable solution that offers flexibility across various privacy requirements and domain sizes, and achieves more accurate query results than any existing methods. Through comprehensive theoretical analysis and extensive experimental evaluations, we validate the effectiveness of CRIAD and demonstrate its superiority over traditional value-perturbation mechanisms.

From Randomized Response to Randomized Index: Answering Subset Counting Queries with Local Differential Privacy

TL;DR

This paper tackles subset counting queries under Local Differential Privacy for set-valued data by introducing CRI, an index-based counting mechanism that achieves index-level deniability without perturbing the values. To further boost utility and scalability, it extends CRI into CRIAD with augmented dummy bits and a flexible multi-dummy, multi-sample, multi-group design, accompanied by formal privacy guarantees, unbiasedness, and variance bounds. The authors provide rigorous analyses and demonstrate that CRIAD substantially outperforms traditional value-perturbation LDP methods on real-world datasets across various privacy budgets and category sizes. The work offers practical implications for private analytics of category-level statistics and lays the groundwork for scalable, privacy-preserving analytics in federated or heterogeneous settings.

Abstract

Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy. Existing perturbation mechanisms typically require perturbing the original values to ensure acceptable privacy, which inevitably results in value distortion and utility deterioration. In this work, we propose an alternative approach -- instead of perturbing values, we apply randomization to indexes of values while ensuring rigorous LDP guarantees. Inspired by the deniability of randomized indexes, we present CRIAD for answering subset counting queries on set-value data. By integrating a multi-dummy, multi-sample, and multi-group strategy, CRIAD serves as a fully scalable solution that offers flexibility across various privacy requirements and domain sizes, and achieves more accurate query results than any existing methods. Through comprehensive theoretical analysis and extensive experimental evaluations, we validate the effectiveness of CRIAD and demonstrate its superiority over traditional value-perturbation mechanisms.

Paper Structure

This paper contains 26 sections, 11 theorems, 40 equations, 4 figures, 7 tables, 3 algorithms.

Key Result

Theorem 2.2

( Sequential Composition) Given $t$ randomized algorithms $\mathcal{A}_i(1\leq i\leq t)$, each providing $\epsilon_i$-local differential privacy, then the sequence of algorithms $\mathcal{A}_i(1\leq i\leq t)$ collectively provides $(\Sigma \epsilon_i)$-local differential privacy.

Figures (4)

  • Figure 1: Workflow of CRI for Answering Subset Counting Queries.
  • Figure 2: Overall performance on real-world datasets with varying privacy budgets.
  • Figure 3: MRE of different methods with varying category size ($\epsilon=1$).
  • Figure 4: MRE of 100 random parameter combinations of $(m,s,g)$.

Theorems & Definitions (23)

  • Definition 2.1
  • Theorem 2.2
  • Definition 2.3
  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 4.1
  • ...and 13 more