Table of Contents
Fetching ...

Differentially-Private Multi-Tier Federated Learning: A Formal Analysis and Evaluation

Evan Chen, Frank Po-Chen Lin, Dong-Jun Han, Christopher G. Brinton

TL;DR

<3-5 sentence high-level summary>: The paper tackles the challenge of preserving privacy in multi-tier federated learning by integrating differential privacy across edge, fog, and cloud layers with heterogeneous trust models. It proposes M^2FDP, a DP-enhanced MFL framework that adaptively injects noise across the network hierarchy, and provides a non-convex convergence analysis showing sublinear convergence to a controllable stationary region influenced by trust and topology. An adaptive control algorithm is developed to jointly optimize step size, local training intervals, and participant selection to balance energy, latency, and accuracy while meeting DP guarantees. Empirical results demonstrate significant improvements in convergence speed, energy efficiency, and latency over baselines, especially when secure intermediate nodes are prevalent. These results highlight the practical viability of privacy-preserving, multi-tier ML in heterogeneous edge/fog/cloud deployments.

Abstract

While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. Differential privacy (DP) is often employed to address such issues. However, the impact of DP on FL in multi-tier networks -- where hierarchical aggregations couple noise injection decisions at different tiers, and trust models are heterogeneous across subnetworks -- is not well understood. To fill this gap, we develop \underline{M}ulti-Tier \underline{F}ederated Learning with \underline{M}ulti-Tier \underline{D}ifferential \underline{P}rivacy ({\tt M$^2$FDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance over such networks. One of the key principles of {\tt M$^2$FDP} is to adapt DP noise injection across the established edge/fog computing hierarchy (e.g., edge devices, intermediate nodes, and other tiers up to cloud servers) according to the trust models in different subnetworks. We conduct a comprehensive analysis of the convergence behavior of {\tt M$^2$FDP} under non-convex problem settings, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. We show how these relationships can be employed to develop an adaptive control algorithm for {\tt M$^2$FDP} that tunes properties of local model training to minimize energy, latency, and the stationarity gap while meeting desired convergence and privacy criterion. Subsequent numerical evaluations demonstrate that {\tt M$^2$FDP} obtains substantial improvements in these metrics over baselines for different privacy budgets and system configurations.

Differentially-Private Multi-Tier Federated Learning: A Formal Analysis and Evaluation

TL;DR

<3-5 sentence high-level summary>: The paper tackles the challenge of preserving privacy in multi-tier federated learning by integrating differential privacy across edge, fog, and cloud layers with heterogeneous trust models. It proposes M^2FDP, a DP-enhanced MFL framework that adaptively injects noise across the network hierarchy, and provides a non-convex convergence analysis showing sublinear convergence to a controllable stationary region influenced by trust and topology. An adaptive control algorithm is developed to jointly optimize step size, local training intervals, and participant selection to balance energy, latency, and accuracy while meeting DP guarantees. Empirical results demonstrate significant improvements in convergence speed, energy efficiency, and latency over baselines, especially when secure intermediate nodes are prevalent. These results highlight the practical viability of privacy-preserving, multi-tier ML in heterogeneous edge/fog/cloud deployments.

Abstract

While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. Differential privacy (DP) is often employed to address such issues. However, the impact of DP on FL in multi-tier networks -- where hierarchical aggregations couple noise injection decisions at different tiers, and trust models are heterogeneous across subnetworks -- is not well understood. To fill this gap, we develop \underline{M}ulti-Tier \underline{F}ederated Learning with \underline{M}ulti-Tier \underline{D}ifferential \underline{P}rivacy ({\tt MFDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance over such networks. One of the key principles of {\tt MFDP} is to adapt DP noise injection across the established edge/fog computing hierarchy (e.g., edge devices, intermediate nodes, and other tiers up to cloud servers) according to the trust models in different subnetworks. We conduct a comprehensive analysis of the convergence behavior of {\tt MFDP} under non-convex problem settings, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. We show how these relationships can be employed to develop an adaptive control algorithm for {\tt MFDP} that tunes properties of local model training to minimize energy, latency, and the stationarity gap while meeting desired convergence and privacy criterion. Subsequent numerical evaluations demonstrate that {\tt MFDP} obtains substantial improvements in these metrics over baselines for different privacy budgets and system configurations.

Paper Structure

This paper contains 35 sections, 10 theorems, 56 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Under Assumption assump:genLoss, there exists constants $c_1$ and $\alpha_l$ such that given the data sampling probability $q$ at each device, and the total number of aggregations $T$ conducted during the model training process, for any $\epsilon<c_1qT$, M$^2$FDP exhibits $(\epsilon,\delta)$-differ Here, $\Delta_{l,c}$ represent the $L_2$-norm sensitivity of the gradients exchanged during the agg

Figures (12)

  • Figure 1: Multi-tier FL network architecture studied in this work with total $L = 3$ layers beneath the main server. $\mathcal{S}_1$ and $\mathcal{S}_2$ are the set of edge servers between the devices and the main server, and $\mathcal{S}_3$ is the set of edge devices computing local gradient updates. The edge devices are further grouped into children nodes of layer $l = 2$, e.g., $\mathcal{S}_{2,c_2}$ and $\mathcal{S}_{2,c'_2}$.
  • Figure 2: Illustration of training timescales in M$^2$FDP for the case of $L=3$ in Fig. \ref{['fig2']}. $K^t$ total local gradient updates are performed between global iterations $t$ and $t+1$. Within the time interval of local iterations $k \in [1, K^t]$, multiple local aggregations from edge devices towards edge servers at layers $l=1$ (i.e., $\mathcal{K}_1^t$) and $l=2$ (i.e., $\mathcal{K}_2^t$) are performed.
  • Figure 3: Illustration of how noise is injected through aggregation to ensure protection on insecure servers. During local aggregation at each layer $l$, the noise added towards the child node's model will be a linear combination of existing noises if the child node is in the set of insecure edge servers $\mathcal{N}_{U,l+1}$. Otherwise, if the child node is in the set of secure edge servers $\mathcal{N}_{T,l+1}$, a new noise that provides the target DP criterion will be generated and injected to the child node's model.
  • Figure 4: Overview of the adaptive control algorithm, outlining its objectives, adjustable parameters, and observations.
  • Figure 5: Performance comparison between M$^2$FDP, the HFL-DP baseline from Shi2021HDP, and an upper bound established by hierarchical FedAvg without DP. M$^2$FDP significantly outperforms HFL-DP and is able to leverage trusted edge servers effectively.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Definition 1: ($\epsilon$,$\delta$)-DP Dwork2014DP
  • Proposition 1: Gaussian Mechanism Abadi2019MA
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • ...and 7 more