Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Hilal Asi; Vitaly Feldman; Jelani Nelson; Huy L. Nguyen; Kunal Talwar; Samson Zhou

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou

TL;DR

This work analyzes private vector mean estimation in the shuffle model, focusing on both multi-message and single-message communication regimes. It develops a multi-message protocol that achieves near-central mean squared error $\mathsf{MSE} = \tilde{O}\left(\frac{d}{\varepsilon^{2}}\right)$ using $\tilde{O}\left(\min(d, n\varepsilon^{2})\right)$ messages per user, and proves a matching lower bound up to logarithmic factors on message complexity, establishing a phase transition in the multi-message setting. For the single-message regime, the authors design a protocol with $\mathsf{MSE} = \mathcal{O}_{\delta}\left(d n^{\frac{d}{d+2}} \varepsilon^{-\frac{4}{d+2}}\right)$ and prove a lower bound $\mathsf{MSE} = \Omega\left(d n^{\frac{d}{d+2}}\right)$ (for constant $\varepsilon$), showing near-optimality in this standard setting. The paper also analyzes robustness to malicious users, revealing that in the single-shuffler model, adversaries can induce large additive errors, while multi-shuffler arrangements can bound such effects more effectively, highlighting important trade-offs between accuracy, communication, and robustness.

Abstract

We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

TL;DR

using

messages per user, and proves a matching lower bound up to logarithmic factors on message complexity, establishing a phase transition in the multi-message setting. For the single-message regime, the authors design a protocol with

and prove a lower bound

(for constant

), showing near-optimality in this standard setting. The paper also analyzes robustness to malicious users, revealing that in the single-shuffler model, adversaries can induce large additive errors, while multi-shuffler arrangements can bound such effects more effectively, highlighting important trade-offs between accuracy, communication, and robustness.

Abstract

We study the problem of private vector mean estimation in the shuffle model of privacy where

users each have a unit vector

. We propose a new multi-message protocol that achieves the optimal error using

messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send

messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error

. Moreover, we show that any single-message protocol must incur mean squared error

, showing that our protocol is optimal in the standard setting where

. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.

Paper Structure (34 sections, 36 theorems, 89 equations, 6 algorithms)

This paper contains 34 sections, 36 theorems, 89 equations, 6 algorithms.

Introduction
Our Contributions
Multiple messages per user (\ref{['sec:multi-msg']}).
Single message per user (\ref{['sec:single-msg']}).
Robustness to malicious users (\ref{['sec:rob']}).
Related Work
Real summation in the shuffle model.
Lower bounds for the multi-message shuffle model.
Mean estimation in the LDP model.
Preliminaries and problem setting
Notation.
Shuffle differential privacy.
Kashin representation
Multiple Messages
Omega(min(eps2n,d)) messages are necessary
...and 19 more sections

Key Result

Theorem 1.1

There exists an $(\varepsilon,\delta)$-DP mechanism for vector aggregation that uses $\tilde{\mathcal{O}}\left(\min(d,n\varepsilon^2)\right)$ messages per user and achieves mean squared error $\tilde{\mathcal{O}}\left(\frac{d}{\varepsilon^2}\right)$.

Theorems & Definitions (57)

Theorem 1.1
Theorem 1.2
Theorem 1.3
Theorem 1.4
Theorem 1.5
Definition 1.6: Differential privacy
Theorem 1.7: Advanced composition of differential privacy DworkR14
Lemma 1.8: Kashin's representation
Theorem 2.1
Lemma 2.1
...and 47 more

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

TL;DR

Abstract

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (57)