Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou
TL;DR
This work analyzes private vector mean estimation in the shuffle model, focusing on both multi-message and single-message communication regimes. It develops a multi-message protocol that achieves near-central mean squared error $\mathsf{MSE} = \tilde{O}\left(\frac{d}{\varepsilon^{2}}\right)$ using $\tilde{O}\left(\min(d, n\varepsilon^{2})\right)$ messages per user, and proves a matching lower bound up to logarithmic factors on message complexity, establishing a phase transition in the multi-message setting. For the single-message regime, the authors design a protocol with $\mathsf{MSE} = \mathcal{O}_{\delta}\left(d n^{\frac{d}{d+2}} \varepsilon^{-\frac{4}{d+2}}\right)$ and prove a lower bound $\mathsf{MSE} = \Omega\left(d n^{\frac{d}{d+2}}\right)$ (for constant $\varepsilon$), showing near-optimality in this standard setting. The paper also analyzes robustness to malicious users, revealing that in the single-shuffler model, adversaries can induce large additive errors, while multi-shuffler arrangements can bound such effects more effectively, highlighting important trade-offs between accuracy, communication, and robustness.
Abstract
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.
