Scalable and Differentially Private Distributed Aggregation in the Shuffled Model
Badih Ghazi, Rasmus Pagh, Ameya Velingker
TL;DR
The paper advances differential privacy in the shuffled model by introducing the Invisibility Cloak protocol, which encodes each input $x_i$ into a set of $m$ random-looking numbers whose sum recovers the input, enabling private aggregation with polylogarithmic communication and error under two DP neighbor notions. It proves DP guarantees for both sum-preserving and single-user changes, using a two-player reduction, gamma-smoothness lemmas, and a zero-sum noise mechanism that cancels in aggregation. The main contributions are a near-optimal trade-off: under sum-preserving changes, worst-case error can be driven to $2^{-m}$ with $m$ messages per user, and under single-user changes, expected error scales as $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\delta}}\right)$ with per-user communication $O\left(\log\left(\frac{n}{\varepsilon\delta}\right)\right)$ messages of size $O\left(\log\left(\frac{n}{\delta}\right)\right)$ bits. The work also analyzes resilience to colluding users and connects to federated learning and sketch-based private analytics, highlighting practical implications for scalable, privacy-preserving distributed computation. Overall, it strengthens the shuffled-model paradigm as a viable middle ground between local DP and MPC for private aggregation in large-scale distributed settings.
Abstract
Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. have given protocols for secure aggregation that achieve differential privacy guarantees in this "shuffled model". Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial $n^{Ω(1)}$ in the number of users $n$. In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in $n$. Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum.
