Table of Contents
Fetching ...

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou

TL;DR

This work analyzes private vector mean estimation in the shuffle model, focusing on both multi-message and single-message communication regimes. It develops a multi-message protocol that achieves near-central mean squared error $\mathsf{MSE} = \tilde{O}\left(\frac{d}{\varepsilon^{2}}\right)$ using $\tilde{O}\left(\min(d, n\varepsilon^{2})\right)$ messages per user, and proves a matching lower bound up to logarithmic factors on message complexity, establishing a phase transition in the multi-message setting. For the single-message regime, the authors design a protocol with $\mathsf{MSE} = \mathcal{O}_{\delta}\left(d n^{\frac{d}{d+2}} \varepsilon^{-\frac{4}{d+2}}\right)$ and prove a lower bound $\mathsf{MSE} = \Omega\left(d n^{\frac{d}{d+2}}\right)$ (for constant $\varepsilon$), showing near-optimality in this standard setting. The paper also analyzes robustness to malicious users, revealing that in the single-shuffler model, adversaries can induce large additive errors, while multi-shuffler arrangements can bound such effects more effectively, highlighting important trade-offs between accuracy, communication, and robustness.

Abstract

We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

TL;DR

This work analyzes private vector mean estimation in the shuffle model, focusing on both multi-message and single-message communication regimes. It develops a multi-message protocol that achieves near-central mean squared error using messages per user, and proves a matching lower bound up to logarithmic factors on message complexity, establishing a phase transition in the multi-message setting. For the single-message regime, the authors design a protocol with and prove a lower bound (for constant ), showing near-optimality in this standard setting. The paper also analyzes robustness to malicious users, revealing that in the single-shuffler model, adversaries can induce large additive errors, while multi-shuffler arrangements can bound such effects more effectively, highlighting important trade-offs between accuracy, communication, and robustness.

Abstract

We study the problem of private vector mean estimation in the shuffle model of privacy where users each have a unit vector . We propose a new multi-message protocol that achieves the optimal error using messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error . Moreover, we show that any single-message protocol must incur mean squared error , showing that our protocol is optimal in the standard setting where . Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.
Paper Structure (34 sections, 36 theorems, 89 equations, 6 algorithms)

This paper contains 34 sections, 36 theorems, 89 equations, 6 algorithms.

Key Result

Theorem 1.1

There exists an $(\varepsilon,\delta)$-DP mechanism for vector aggregation that uses $\tilde{\mathcal{O}}\left(\min(d,n\varepsilon^2)\right)$ messages per user and achieves mean squared error $\tilde{\mathcal{O}}\left(\frac{d}{\varepsilon^2}\right)$.

Theorems & Definitions (57)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5
  • Definition 1.6: Differential privacy
  • Theorem 1.7: Advanced composition of differential privacy DworkR14
  • Lemma 1.8: Kashin's representation
  • Theorem 2.1
  • Lemma 2.1
  • ...and 47 more