Table of Contents
Fetching ...

Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

Badih Ghazi, Rasmus Pagh, Ameya Velingker

TL;DR

The paper advances differential privacy in the shuffled model by introducing the Invisibility Cloak protocol, which encodes each input $x_i$ into a set of $m$ random-looking numbers whose sum recovers the input, enabling private aggregation with polylogarithmic communication and error under two DP neighbor notions. It proves DP guarantees for both sum-preserving and single-user changes, using a two-player reduction, gamma-smoothness lemmas, and a zero-sum noise mechanism that cancels in aggregation. The main contributions are a near-optimal trade-off: under sum-preserving changes, worst-case error can be driven to $2^{-m}$ with $m$ messages per user, and under single-user changes, expected error scales as $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\delta}}\right)$ with per-user communication $O\left(\log\left(\frac{n}{\varepsilon\delta}\right)\right)$ messages of size $O\left(\log\left(\frac{n}{\delta}\right)\right)$ bits. The work also analyzes resilience to colluding users and connects to federated learning and sketch-based private analytics, highlighting practical implications for scalable, privacy-preserving distributed computation. Overall, it strengthens the shuffled-model paradigm as a viable middle ground between local DP and MPC for private aggregation in large-scale distributed settings.

Abstract

Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. have given protocols for secure aggregation that achieve differential privacy guarantees in this "shuffled model". Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial $n^{Ω(1)}$ in the number of users $n$. In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in $n$. Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum.

Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

TL;DR

The paper advances differential privacy in the shuffled model by introducing the Invisibility Cloak protocol, which encodes each input into a set of random-looking numbers whose sum recovers the input, enabling private aggregation with polylogarithmic communication and error under two DP neighbor notions. It proves DP guarantees for both sum-preserving and single-user changes, using a two-player reduction, gamma-smoothness lemmas, and a zero-sum noise mechanism that cancels in aggregation. The main contributions are a near-optimal trade-off: under sum-preserving changes, worst-case error can be driven to with messages per user, and under single-user changes, expected error scales as with per-user communication messages of size bits. The work also analyzes resilience to colluding users and connects to federated learning and sketch-based private analytics, highlighting practical implications for scalable, privacy-preserving distributed computation. Overall, it strengthens the shuffled-model paradigm as a viable middle ground between local DP and MPC for private aggregation in large-scale distributed settings.

Abstract

Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. have given protocols for secure aggregation that achieve differential privacy guarantees in this "shuffled model". Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial in the number of users . In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in . Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum.

Paper Structure

This paper contains 27 sections, 18 theorems, 68 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1

Let $\varepsilon > 0$ and $\delta \in (0,1)$ be any real numbers. There exists a protocol in the shuffled model that is $(\varepsilon,\delta)$-differentially private under single-user changes, has expected error $O(\frac{1}{\varepsilon} \sqrt{\log\frac{1}{\delta}})$, and where each encoder sends $O(

Figures (2)

  • Figure 1: Comparison of differentially private aggregation protocols in the shuffled model with $(\varepsilon,\delta)$-differential privacy. The number of users is $n$, and $m$ is an integer parameter. Message sizes are in bits; asymptotic notation is suppressed for readability. We consider two types of privacy protection, corresponding to different notions of "neighboring dataset" in differential privacy: In the first one, which was considered in previous papers, datasets are considered neighboring if they differ in the data of a single user. In the latter, datasets are considered neighboring if they have the same sum.
  • Figure 2: Diagram of the Invisibility Cloak Protocol for secure multi-party aggregation

Theorems & Definitions (37)

  • Theorem 1
  • Theorem 2
  • Definition 1
  • Definition 2
  • Lemma 1
  • proof : Proof of Lemma \ref{['lemma:gamma_smooth']}
  • Corollary 1
  • proof
  • Lemma 2
  • proof : Proof of Lemma \ref{['lemma:two_player_set_probability']}
  • ...and 27 more