Table of Contents
Fetching ...

Distribution-Aware Mean Estimation under User-level Local Differential Privacy

Corentin Pla, Hugo Richard, Maxime Vono

TL;DR

Based on a distribution-aware mean estimation algorithm, an upper bounds on the worst-case risk over $\mu$ are established and a lower bound is derived for the task of mean estimation under user-level local differential privacy.

Abstract

We consider the problem of mean estimation under user-level local differential privacy, where $n$ users are contributing through their local pool of data samples. Previous work assume that the number of data samples is the same across users. In contrast, we consider a more general and realistic scenario where each user $u \in [n]$ owns $m_u$ data samples drawn from some generative distribution $μ$; $m_u$ being unknown to the statistician but drawn from a known distribution $M$ over $\mathbb{N}^\star$. Based on a distribution-aware mean estimation algorithm, we establish an $M$-dependent upper bounds on the worst-case risk over $μ$ for the task of mean estimation. We then derive a lower bound. The two bounds are asymptotically matching up to logarithmic factors and reduce to known bounds when $m_u = m$ for any user $u$.

Distribution-Aware Mean Estimation under User-level Local Differential Privacy

TL;DR

Based on a distribution-aware mean estimation algorithm, an upper bounds on the worst-case risk over are established and a lower bound is derived for the task of mean estimation under user-level local differential privacy.

Abstract

We consider the problem of mean estimation under user-level local differential privacy, where users are contributing through their local pool of data samples. Previous work assume that the number of data samples is the same across users. In contrast, we consider a more general and realistic scenario where each user owns data samples drawn from some generative distribution ; being unknown to the statistician but drawn from a known distribution over . Based on a distribution-aware mean estimation algorithm, we establish an -dependent upper bounds on the worst-case risk over for the task of mean estimation. We then derive a lower bound. The two bounds are asymptotically matching up to logarithmic factors and reduce to known bounds when for any user .

Paper Structure

This paper contains 10 sections, 16 theorems, 93 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Theorem 1

Assume ass:distribution-ass:high privacy. Then, there exist $c_1, c_2 > 0$, independent of $\alpha$, $n$ and $m$, such that the following lower bound holds: The positive constants $c_1$ and $c_2$ are explicitly given in app:lb2.

Figures (1)

  • Figure :

Theorems & Definitions (16)

  • Theorem 1: Lower bound
  • Theorem 2: Upper bound
  • Corollary 1: Upper bound
  • Theorem S2: Lower bound
  • Lemma S1: Decomposition of TV distance by number of users
  • Lemma S2: TV and KL distance between $\mu_0$ and $\mu_1$
  • Theorem S2: Upper bound
  • Lemma S3
  • Lemma S4: Hoeffding bound 409cf137-dbb5-3eb1-8cfe-0743c3dc925f
  • Lemma S5: Upper bound on $\mathbb{P}(\overline{A})$
  • ...and 6 more