Table of Contents
Fetching ...

Segmented Private Data Aggregation in the Multi-message Shuffle Model

Shaowei Wang, Hongqiao Chen, Sufen Zeng, Ruilin Yang, Hui Jiang, Peigen Ye, Kaiqi Yu, Rundong Mei, Shaozheng Huang, Wei Yang, Bangzhou Xin

TL;DR

This work addresses the need for flexible privacy protection in decentralized data collection by introducing agnostic segmented privacy within the multi-message shuffle differential privacy framework. The authors decouple user data from privacy-level choices, anonymize privacy preferences, and optimize the use of blanket messages to improve aggregation utility while maintaining DP guarantees. They provide a concrete protocol for set-valued data analyses with almost-tight privacy amplification bounds and validate substantial utility gains—up to about 50% reduction in estimation error—in both real and synthetic datasets. The approach offers practical benefits for decentralized data analytics, enabling heterogeneous privacy preferences without sacrificing privacy or utility and requiring only a few messages per user across two interaction rounds.

Abstract

The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy protection level for all users, which may deter conservative users from participating and prevent liberal users from contributing more information, thereby reducing the overall data utility, such as the accuracy of aggregated statistics. In this work, we pioneer the study of segmented private data aggregation within the multi-message shuffle model of DP, introducing flexible privacy protection for users and enhanced utility for the aggregation server. Our framework not only protects users' data but also anonymizes their privacy level choices to prevent potential data leakage from these choices. To optimize the privacy-utility-communication trade-offs, we explore approximately optimal configurations for the number of blanket messages and conduct almost tight privacy amplification analyses within the shuffle model. Through extensive experiments, we demonstrate that our segmented multi-message shuffle framework achieves a reduction of about 50\% in estimation error compared to existing approaches, significantly enhancing both privacy and utility.

Segmented Private Data Aggregation in the Multi-message Shuffle Model

TL;DR

This work addresses the need for flexible privacy protection in decentralized data collection by introducing agnostic segmented privacy within the multi-message shuffle differential privacy framework. The authors decouple user data from privacy-level choices, anonymize privacy preferences, and optimize the use of blanket messages to improve aggregation utility while maintaining DP guarantees. They provide a concrete protocol for set-valued data analyses with almost-tight privacy amplification bounds and validate substantial utility gains—up to about 50% reduction in estimation error—in both real and synthetic datasets. The approach offers practical benefits for decentralized data analytics, enabling heterogeneous privacy preferences without sacrificing privacy or utility and requiring only a few messages per user across two interaction rounds.

Abstract

The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy protection level for all users, which may deter conservative users from participating and prevent liberal users from contributing more information, thereby reducing the overall data utility, such as the accuracy of aggregated statistics. In this work, we pioneer the study of segmented private data aggregation within the multi-message shuffle model of DP, introducing flexible privacy protection for users and enhanced utility for the aggregation server. Our framework not only protects users' data but also anonymizes their privacy level choices to prevent potential data leakage from these choices. To optimize the privacy-utility-communication trade-offs, we explore approximately optimal configurations for the number of blanket messages and conduct almost tight privacy amplification analyses within the shuffle model. Through extensive experiments, we demonstrate that our segmented multi-message shuffle framework achieves a reduction of about 50\% in estimation error compared to existing approaches, significantly enhancing both privacy and utility.
Paper Structure (25 sections, 7 theorems, 23 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 7 theorems, 23 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

If a protocol $\mathcal{R}:\mathbb{X}^n\mapsto \mathbb{Z}$ satisfies $(\epsilon,\delta)$-differential privacy for all neighboring datasets, then for datasets $X, X'\in \mathbb{X}^n$ that differ at most $s$ entries, the $\mathcal{R}(X)$ and $\mathcal{R}(X')$ are $(s\cdot\epsilon, s\cdot e^{s\cdot \ep

Figures (6)

  • Figure 1: Illustration of multi-message shuffle DP framework with agnostic segmented privacy preservation.
  • Figure 2: Experimental results on the MSNBC dataset with the level setting $S_1$.
  • Figure 3: Experimental results on synthetic dataset with $T=128$, and the level setting $S_1$.
  • Figure 4: Experimental results on the MSNBC dataset with the level setting $S_2$ (a,b) and setting $S_3$ (c,d).
  • Figure 5: Experimental results on synthetic dataset with $T=128$, and the level setting $S_2$.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Definition 1: Hockey-stick divergence
  • Definition 2: Differential privacy dwork2006differential
  • Definition 3: Local differential privacy kasiviswanathan2011can
  • Lemma 1: Group Composition of DP dwork2006differential
  • Definition 4: Data processing inequality
  • Definition 5: Segmented Differential Privacy
  • Definition 6: Differential Privacy in the Shuffle Model
  • Definition 7: Segmented Differential privacy in the Shuffle Model
  • Theorem 1: Variation-ratio reduction wang2023unified
  • Lemma 2: Simplification of Privacy Guarantee
  • ...and 4 more