Table of Contents
Fetching ...

Accurate, private, secure, federated U-statistics with higher degree

Quentin Sinh, Jan Ramon

TL;DR

This work proposes a protocol that securely computes U-statistics of degree k $\ge$ 2 under central differential privacy by leveraging Multi Party Computation (MPC), and substantially improves accuracy when compared to prior solutions.

Abstract

We study the problem of computing a U-statistic with a kernel function f of degree k $\ge$ 2, i.e., the average of some function f over all k-tuples of instances, in a federated learning setting. Ustatistics of degree 2 include several useful statistics such as Kendall's $τ$ coefficient, the Area under the Receiver-Operator Curve and the Gini mean difference. Existing methods provide solutions only under the lower-utility local differential privacy model and/or scale poorly in the size of the domain discretization. In this work, we propose a protocol that securely computes U-statistics of degree k $\ge$ 2 under central differential privacy by leveraging Multi Party Computation (MPC). Our method substantially improves accuracy when compared to prior solutions. We provide a detailed theoretical analysis of its accuracy, communication and computational properties. We evaluate its performance empirically, obtaining favorable results, e.g., for Kendall's $τ$ coefficient, our approach reduces the Mean Squared Error by up to four orders of magnitude over existing baselines.

Accurate, private, secure, federated U-statistics with higher degree

TL;DR

This work proposes a protocol that securely computes U-statistics of degree k 2 under central differential privacy by leveraging Multi Party Computation (MPC), and substantially improves accuracy when compared to prior solutions.

Abstract

We study the problem of computing a U-statistic with a kernel function f of degree k 2, i.e., the average of some function f over all k-tuples of instances, in a federated learning setting. Ustatistics of degree 2 include several useful statistics such as Kendall's coefficient, the Area under the Receiver-Operator Curve and the Gini mean difference. Existing methods provide solutions only under the lower-utility local differential privacy model and/or scale poorly in the size of the domain discretization. In this work, we propose a protocol that securely computes U-statistics of degree k 2 under central differential privacy by leveraging Multi Party Computation (MPC). Our method substantially improves accuracy when compared to prior solutions. We provide a detailed theoretical analysis of its accuracy, communication and computational properties. We evaluate its performance empirically, obtaining favorable results, e.g., for Kendall's coefficient, our approach reduces the Mean Squared Error by up to four orders of magnitude over existing baselines.
Paper Structure (83 sections, 13 theorems, 62 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 83 sections, 13 theorems, 62 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Lemma 2.7

Introduced by Dwork et al. dwork2006calibrating, the Laplace mechanism $\mathcal{M}_\text{Lap}$ for a function $f: \mathbb{X}^k \rightarrow \mathbb{R}$ is defined as $\mathcal{M}_\text{Lap}(x, f, \epsilon) = f(x) + \eta_\text{Lap}$ where $x \in \mathbb{X}^k$, $\epsilon \in \mathbb{R}$ and $\eta_\tex

Figures (10)

  • Figure 1: Description of functionality $\mathcal{F}_f$ for computing $f$.
  • Figure 2: Description of functionality $\mathcal{F}_\text{noise}$ for computing the shared noise.
  • Figure 3: Online total communication cost over the number of parties (left) and MSE over the communication cost (right) for the computation of Gini mean difference for $\epsilon = 1$. The number of discretization bins is set to $t = 256$. Each data point $x_i$ is uniformly drawn from $[0, 1]$. For $\mathsf{Umpc}$, we sample $2 \%$ of all possible pairs for $|E|$, i.e., $0.02 \binom{n}{2}$.
  • Figure 4: Online per-party computation cost over the number of parties (left) and online server computation cost over the number of parties (right) for the computation of Gini mean difference over the number of parties $n$ for $\epsilon = 1$. The number of discretization bins is $t = 256$. Each data point $x_i$ is uniformly drawn from $[0, 1]$. For $\mathsf{Umpc}$, we sample $2 \%$ of all possible pairs for $|E|$, i.e., $0.02 \binom{n}{2}$.
  • Figure 5: Online total communication cost (left) and online total computation cost (right) for computing the Kendall's $\tau$ coefficient over the number of discretization bins $t$. The total computation cost is defined as the sum of the server computation and $n$ times the per-party computation cost. The communication and computation costs vary with $\epsilon = \{0, 1\}$ only for $\mathsf{Ghazi}$ and $\mathsf{GhaziSM}$. The dataset is taken from bank_marketing_222 and contains $n = 4521$ entries. For $\mathsf{Umpc}$, we sample $2 \%$ of all possible pairs for $|E|$, i.e., $|E| = 0.02 \binom{n}{2} \approx 204349$.
  • ...and 5 more figures

Theorems & Definitions (37)

  • Definition 2.1: U-statistic
  • Definition 2.2: Incomplete U-statistic
  • Definition 2.3: Secret Sharing
  • Definition 2.4: Central DP
  • Definition 2.5: Local DP
  • Definition 2.6: Sensitivity
  • Lemma 2.7: Laplace mechanism
  • Lemma 2.8: Gaussian mechanism
  • Remark 3.1
  • Lemma 4.1: Correctness
  • ...and 27 more