Differentially Private Federated Learning without Noise Addition: When is it Possible?

Jiang Zhang; Konstantinos Psounis; Salman Avestimehr

Differentially Private Federated Learning without Noise Addition: When is it Possible?

Jiang Zhang, Konstantinos Psounis, Salman Avestimehr

TL;DR

This work formally identifies the necessary condition that SA can provide DP without addition noise and proves that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy bounded by the reciprocal of the minimum eigenvalue of the covariance matrix.

Abstract

Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate model over multiple training rounds thanks to leveraging the "noise" from other users' updates. However, the privacy metric used in that work (mutual information) measures the on-average privacy leakage, without providing any privacy guarantees for worse-case scenarios. To address this, in this work we study the conditions under which FL with SA can provide worst-case differential privacy guarantees. Specifically, we formally identify the necessary condition that SA can provide DP without addition noise. We then prove that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy $ε$ bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee.

Differentially Private Federated Learning without Noise Addition: When is it Possible?

TL;DR

Abstract

bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee.

Paper Structure (22 sections, 6 theorems, 21 equations, 5 figures, 1 table)

This paper contains 22 sections, 6 theorems, 21 equations, 5 figures, 1 table.

Introduction
Preliminaries
Federated learning
Secure aggregation guarantees
Differential privacy
Threat model for FL with SA
Problem Statement
Motivation
Negative result for DP
What we need for DP guarantee
Theoretical Results
Basic assumption for gradient noise
Necessary condition for DP guarantee
Gaussian sampling noise with non-singular covariance matrix
Gaussian sampling noise with singular gradient covariance matrix
...and 7 more sections

Key Result

Lemma 1

A mechanism that satisfies $(\alpha, \epsilon)$-RDP also satisfies $(\epsilon + \frac{\log(1/\delta)}{\alpha-1}, \delta)$-DP for any $\delta > 0$.

Figures (5)

Figure 1: Federated learning with SA and DP guarantees.
Figure 2: System model for FL with SA. Note that the input of this system is users' local datasets ($\{D_i\}_{i=1}^{i=N}$), and the output of the system is the aggregated model update ($\sum_{i=1}^{i=N}x_i^{(t)}$), which is a random vector due to users' local gradient (i.e. data batch) sampling. The server will infer user $i$'s local dataset ($D_i$) by observing $\sum_{i=1}^{i=N}x_i^{(t)}$.
Figure 3: Heatmap of the absolute values of sampled updates from users $1,2$ and $3$ in the counterexample. $x_4$ and $x_4'$ can be distinguished even adding the aggregated noise from $\sum_{i=1}^3 x_i$.
Figure 4: Comparison of WF noise and isotropic noise.
Figure 5: Comparison of different DP mechanisms on MNIST dataset. Note that we consider 50 users participating in FL. The training epoch is set as 100, the mini-batch size $B$ is 32, the clipped value $C$ is set as 10, and we consider $\delta=10^{-4}$. We report the accumulative privacy across all training epochs by using the composition theorem in kairouz2015composition.

Theorems & Definitions (9)

Definition 1: DP dwork2014algorithmic
Definition 2: Rényi Divergencegil2013renyi
Definition 3: $(\alpha, \epsilon)$-RDPmironov2017renyi
Lemma 1: From RDP to $(\epsilon,\delta)$-DPmironov2017renyi
Theorem 1: A necessary condition for DP guarantee
Lemma 2: Bounded maximal singular value
Theorem 2
Theorem 3
Theorem 4: DP guarantees of Gaussian sampling noise + WF-NA algorithm

Differentially Private Federated Learning without Noise Addition: When is it Possible?

TL;DR

Abstract

Differentially Private Federated Learning without Noise Addition: When is it Possible?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)