Table of Contents
Fetching ...

Differentially-Private Collaborative Online Personalized Mean Estimation

Yauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin, Eirik Rosnes, Jörg Kliewer

TL;DR

The work addresses online personalized mean estimation across multiple agents under differential privacy. It introduces two privacy mechanisms (PM-I and PM-II) and two data-variance estimation schemes, coupled with a hypothesis-testing-based decision rule to identify agents with the same mean, and a linear statistic to fuse information from selected peers. Theoretical results show faster convergence than fully local methods under Bernstein-type conditions, with analytical ideal/oracle performance curves and extensive simulations confirming the benefits of private collaboration. The findings demonstrate that online private collaboration can closely match ideal performance in practice, motivating practical deployment with privacy guarantees and paving the way for future work on communication-efficient designs.

Abstract

We consider the problem of collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. In particular, we provide a method based on hypothesis testing coupled with differential privacy and data variance estimation. Two privacy mechanisms and two data variance estimation schemes are proposed, and we provide a theoretical convergence analysis of the proposed algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data. Moreover, we provide analytical performance curves for the case with an oracle class estimator, i.e., the class structure of the agents, where agents receiving data from distributions with the same mean are considered to be in the same class, is known. The theoretical faster-than-local convergence guarantee is backed up by extensive numerical results showing that for a considered scenario the proposed approach indeed converges much faster than a fully local approach, and performs comparably to ideal performance where all data is public. This illustrates the benefit of private collaboration in an online setting.

Differentially-Private Collaborative Online Personalized Mean Estimation

TL;DR

The work addresses online personalized mean estimation across multiple agents under differential privacy. It introduces two privacy mechanisms (PM-I and PM-II) and two data-variance estimation schemes, coupled with a hypothesis-testing-based decision rule to identify agents with the same mean, and a linear statistic to fuse information from selected peers. Theoretical results show faster convergence than fully local methods under Bernstein-type conditions, with analytical ideal/oracle performance curves and extensive simulations confirming the benefits of private collaboration. The findings demonstrate that online private collaboration can closely match ideal performance in practice, motivating practical deployment with privacy guarantees and paving the way for future work on communication-efficient designs.

Abstract

We consider the problem of collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. In particular, we provide a method based on hypothesis testing coupled with differential privacy and data variance estimation. Two privacy mechanisms and two data variance estimation schemes are proposed, and we provide a theoretical convergence analysis of the proposed algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data. Moreover, we provide analytical performance curves for the case with an oracle class estimator, i.e., the class structure of the agents, where agents receiving data from distributions with the same mean are considered to be in the same class, is known. The theoretical faster-than-local convergence guarantee is backed up by extensive numerical results showing that for a considered scenario the proposed approach indeed converges much faster than a fully local approach, and performs comparably to ideal performance where all data is public. This illustrates the benefit of private collaboration in an online setting.

Paper Structure

This paper contains 37 sections, 15 theorems, 52 equations, 4 figures, 1 algorithm.

Key Result

Lemma 1

Let $(x_1,\dotsc,x_n) \in \mathcal{X}^n$ where $\mathcal{X} = [\mu-L,\mu+L]$ for some finite values $\mu$ and $L$. Then, the noise-corrupted sample mean $(x_1+\cdots+x_n)/n + Z/n$, where $Z \sim \mathcal{N}\left(0,\sigma_{\mathrm{DP}}^2 \right)$ and $\sigma_{\mathrm{DP}}^2 \triangleq 8L^2 \ln(1.25/\

Figures (4)

  • Figure 1: System model.
  • Figure 2: Average mean squared error of \ref{['alg:ss']} with $\sigma=1/2$. The left plot shows simulation results with RR and rRR scheduling for the case of $M=200$ agents forming three classes, with an overall privacy level of $\epsilon=1$ with $\delta=10^{-6}$. The average of $20$ simulation runs is presented. The middle and right plots show the corresponding (analytical; see Proposition \ref{['prop:2']}) performance with an oracle class estimator and with RR scheduling for $M=200$ and $30$ agents, respectively. The curves are for uniform data and $L = \sigma \sqrt{3}$.
  • Figure 3: (Left plot): Comparisons of the average mean squared error of \ref{['alg:ss']} with and without data variance estimation for the same scenario as in \ref{['fig:1']} (left and middle plot) for PM-I with non-MoM weights and RR scheduling. For completeness, the performance of a fully local approach, PM-I with MoM weights, and an ideal scheme is also depicted. The average of $20$ simulation runs is presented. Right plot: The initial part of the performance curves from the left plot up to time $t = 2000$.
  • Figure 4: (Left plot): Comparisons of the average mean squared error of \ref{['alg:ss']} with Gaussian and Laplace DP noise for the same scenario as in \ref{['fig:1']}, but for $M=15$ agents, for PM-I and PM-II with non-MoM weights and RR scheduling. The average of $100$ simulation runs is presented. For completeness, the performance of a fully local approach and an ideal scheme is also depicted. (Right plot): The corresponding (analytical; see Proposition \ref{['prop:2']}) performance with Laplace DP noise and with an oracle class estimator.

Theorems & Definitions (20)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Definition 2: Wainwright19_1
  • Remark 1
  • Lemma 5
  • Lemma 6
  • Remark 2
  • ...and 10 more