Differentially-Private Multi-Tier Federated Learning

Evan Chen; Frank Po-Chen Lin; Dong-Jun Han; Christopher G. Brinton

Differentially-Private Multi-Tier Federated Learning

Evan Chen, Frank Po-Chen Lin, Dong-Jun Han, Christopher G. Brinton

TL;DR

This work proposes Multi-Tier Federated Learning with Multi-Tier Differential Privacy (M^2FDP), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks and conducts a comprehensive analysis of the convergence behavior of M^2FDP.

Abstract

While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose Multi-Tier Federated Learning with Multi-Tier Differential Privacy (M^2FDP), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. One of the key concepts of M^2FDP is to extend the concept of HDP towards Multi-Tier Differential Privacy (MDP), while also adapting DP noise injection at different layers of an established FL hierarchy -- edge devices, edge servers, and cloud servers -- according to the trust models within particular subnetworks. We conduct a comprehensive analysis of the convergence behavior of M^2FDP, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. Subsequent numerical evaluations demonstrate that M^2FDP obtains substantial improvements in these metrics over baselines for different privacy budgets, and validate the impact of different system configurations.

Differentially-Private Multi-Tier Federated Learning

TL;DR

Abstract

Paper Structure (22 sections, 9 theorems, 43 equations, 4 figures, 1 algorithm)

This paper contains 22 sections, 9 theorems, 43 equations, 4 figures, 1 algorithm.

Introduction
Preliminaries and System Model
Differential Privacy (DP)
Multi-Tier Network System Model
Machine Learning Model
Proposed Methodology
Model Training Timescales
M$^2$FDP Training and Aggregations
DP Mechanisms
Convergence Analysis
Analysis Assumptions
Preliminary Quantities
General Convergence Behavior of M$^2$FDP
Experimental Evaluation
Simulation Setup
...and 7 more sections

Key Result

Proposition 1

Under Assumption assump:genLoss, there exists constants $c_1$ and $\alpha_l$ such that given the data sampling probability $q$ at each device, and the total number of aggregations $L$ conducted during the model training process, for any $\epsilon<c_1qL$, M$^2$FDP exhibits $(\epsilon,\delta)$-differ Here, $\Delta_{l,c}$ represent the $L_2$-norm sensitivity of the gradients exchanged during the agg

Figures (4)

Figure 1: Multi-tier network architecture with total four layers ($L = 3$), where $\mathcal{S}_1$ and $\mathcal{S}_2$ are the set of edge servers between the devices and the main server, and $\mathcal{S}_3$ is the set of edge devices computing local gradient update. During local aggregation at each layer $l$, the noise added towards the child node's model will be a linear combination of existing noises if the child node is in the set of insecure edge servers $\mathcal{N}_{U,l+1}$. Otherwise, if the child node is in the set of secure edge servers $\mathcal{N}_{T,l+1}$, a new noise that guarantees differential privacy will be generated and injected to the child node's model.
Figure 2: Performance comparison between M$^2$FDP, the HFL-DP baseline from Shi2021HDP, and an upper bound established by hierarchical FedAvg without DP. M$^2$FDP significantly outperforms HFL-DP and is able to leverage trusted edge servers effectively.
Figure 3: Interplay between privacy and performance in M$^2$FDP across various probabilities ($p_1$) of a subnet's linkage to a secure edge server under different privacy budgets ($\epsilon$).
Figure 4: Impact of various network configurations on the performance of M$^2$FDP. Under the same network size, enhancing the size of each subnet $s_1$ yields superior test accuracy compared to merely increasing the number of subnets $N_1$.

Theorems & Definitions (17)

Definition 1: ($\epsilon$,$\delta$)-DP Dwork2014DP
Proposition 1: Gaussian Mechanism Abadi2019MA
Lemma 1
Theorem 1
Corollary 1
Theorem 1
proof
Corollary 1
proof
Lemma 1
...and 7 more

Differentially-Private Multi-Tier Federated Learning

TL;DR

Abstract

Differentially-Private Multi-Tier Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (17)