Segmented Private Data Aggregation in the Multi-message Shuffle Model

Shaowei Wang; Hongqiao Chen; Sufen Zeng; Ruilin Yang; Hui Jiang; Peigen Ye; Kaiqi Yu; Rundong Mei; Shaozheng Huang; Wei Yang; Bangzhou Xin

Segmented Private Data Aggregation in the Multi-message Shuffle Model

Shaowei Wang, Hongqiao Chen, Sufen Zeng, Ruilin Yang, Hui Jiang, Peigen Ye, Kaiqi Yu, Rundong Mei, Shaozheng Huang, Wei Yang, Bangzhou Xin

TL;DR

This work addresses the need for flexible privacy protection in decentralized data collection by introducing agnostic segmented privacy within the multi-message shuffle differential privacy framework. The authors decouple user data from privacy-level choices, anonymize privacy preferences, and optimize the use of blanket messages to improve aggregation utility while maintaining DP guarantees. They provide a concrete protocol for set-valued data analyses with almost-tight privacy amplification bounds and validate substantial utility gains—up to about 50% reduction in estimation error—in both real and synthetic datasets. The approach offers practical benefits for decentralized data analytics, enabling heterogeneous privacy preferences without sacrificing privacy or utility and requiring only a few messages per user across two interaction rounds.

Abstract

The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy protection level for all users, which may deter conservative users from participating and prevent liberal users from contributing more information, thereby reducing the overall data utility, such as the accuracy of aggregated statistics. In this work, we pioneer the study of segmented private data aggregation within the multi-message shuffle model of DP, introducing flexible privacy protection for users and enhanced utility for the aggregation server. Our framework not only protects users' data but also anonymizes their privacy level choices to prevent potential data leakage from these choices. To optimize the privacy-utility-communication trade-offs, we explore approximately optimal configurations for the number of blanket messages and conduct almost tight privacy amplification analyses within the shuffle model. Through extensive experiments, we demonstrate that our segmented multi-message shuffle framework achieves a reduction of about 50\% in estimation error compared to existing approaches, significantly enhancing both privacy and utility.

Segmented Private Data Aggregation in the Multi-message Shuffle Model

TL;DR

Abstract

Paper Structure (25 sections, 7 theorems, 23 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 7 theorems, 23 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Related Works
Shuffle Model of Differential Privacy
Segmented Privacy Preservation
Preliminaries
Differential Privacy and Its Segmented Version
The Shuffle Model of Differential Privacy
Privacy Amplification via Shuffling
A Personalized Multi-message Shuffle DP Framework
Trust Model
Design Goals
Overall Procedures
Simplification of Privacy Guarantee
A Protocol for Set-valued Data Analyses
Privacy Guarantees
...and 10 more sections

Key Result

Lemma 1

If a protocol $\mathcal{R}:\mathbb{X}^n\mapsto \mathbb{Z}$ satisfies $(\epsilon,\delta)$-differential privacy for all neighboring datasets, then for datasets $X, X'\in \mathbb{X}^n$ that differ at most $s$ entries, the $\mathcal{R}(X)$ and $\mathcal{R}(X')$ are $(s\cdot\epsilon, s\cdot e^{s\cdot \ep

Figures (6)

Figure 1: Illustration of multi-message shuffle DP framework with agnostic segmented privacy preservation.
Figure 2: Experimental results on the MSNBC dataset with the level setting $S_1$.
Figure 3: Experimental results on synthetic dataset with $T=128$, and the level setting $S_1$.
Figure 4: Experimental results on the MSNBC dataset with the level setting $S_2$ (a,b) and setting $S_3$ (c,d).
Figure 5: Experimental results on synthetic dataset with $T=128$, and the level setting $S_2$.
...and 1 more figures

Theorems & Definitions (14)

Definition 1: Hockey-stick divergence
Definition 2: Differential privacy dwork2006differential
Definition 3: Local differential privacy kasiviswanathan2011can
Lemma 1: Group Composition of DP dwork2006differential
Definition 4: Data processing inequality
Definition 5: Segmented Differential Privacy
Definition 6: Differential Privacy in the Shuffle Model
Definition 7: Segmented Differential privacy in the Shuffle Model
Theorem 1: Variation-ratio reduction wang2023unified
Lemma 2: Simplification of Privacy Guarantee
...and 4 more

Segmented Private Data Aggregation in the Multi-message Shuffle Model

TL;DR

Abstract

Segmented Private Data Aggregation in the Multi-message Shuffle Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (14)