Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

Shahryar Zehtabi; Seyyedali Hosseinalipour; Christopher G. Brinton

Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

Shahryar Zehtabi, Seyyedali Hosseinalipour, Christopher G. Brinton

TL;DR

The paper addresses resource-constrained, fully decentralized FL over time-varying D2D networks without a central server. It introduces EF-HC, an asynchronous, event-triggered, bandwidth-aware triggering framework with personalized thresholds to balance local updates and communications. The authors prove last-iterate convergence at a $O(\ln{k}/\sqrt{k})$ rate under a $B$-connected information-flow graph and show that information can propagate despite sporadic communications, complemented by extensive simulations demonstrating communication savings and faster convergence relative to baselines. The work has practical implications for edge/fog networks with heterogeneous devices and varying connectivity.

Abstract

Federated learning (FL) is a popular technique for distributing machine learning (ML) across a set of edge devices. In this paper, we study fully decentralized FL, where in addition to devices conducting training locally, they carry out model aggregations via cooperative consensus formation over device-to-device (D2D) networks. We introduce asynchronous, event-triggered communications among the devices to handle settings where access to a central server is not feasible. To account for the inherent resource heterogeneity and statistical diversity challenges in FL, we define personalized communication triggering conditions at each device that weigh the change in local model parameters against the available local network resources. We theoretically recover the $O(\ln{k} / \sqrt{k})$ convergence rate to the globally optimal model of decentralized gradient descent (DGD) methods in the setup of our methodology. We provide our convergence guarantees for the last iterates of models, under relaxed graph connectivity and data heterogeneity assumptions compared with the existing literature. To do so, we demonstrate a $B$-connected information flow guarantee in the presence of sporadic communications over the time-varying D2D graph. Our subsequent numerical evaluations demonstrate that our methodology obtains substantial improvements in convergence speed and/or communication savings compared to existing decentralized FL baselines.

Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

TL;DR

rate under a

-connected information-flow graph and show that information can propagate despite sporadic communications, complemented by extensive simulations demonstrating communication savings and faster convergence relative to baselines. The work has practical implications for edge/fog networks with heterogeneous devices and varying connectivity.

Abstract

convergence rate to the globally optimal model of decentralized gradient descent (DGD) methods in the setup of our methodology. We provide our convergence guarantees for the last iterates of models, under relaxed graph connectivity and data heterogeneity assumptions compared with the existing literature. To do so, we demonstrate a

-connected information flow guarantee in the presence of sporadic communications over the time-varying D2D graph. Our subsequent numerical evaluations demonstrate that our methodology obtains substantial improvements in convergence speed and/or communication savings compared to existing decentralized FL baselines.

Paper Structure (30 sections, 23 theorems, 124 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 23 theorems, 124 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Related Work
Consensus-based distributed optimization
Resource-efficient federated learning
Outline and Summary of Contributions
Notations
Methodology and Algorithm
Device and Learning Model
Network Model and Event-Triggering
Iterate Relations
Convergence Analysis
Assumptions
Main Connectivity Result
Intermediate Lemmas for Convergence
Main Convergence Results
...and 15 more sections

Key Result

Proposition 3.6

Let Assumption assump:conn hold. Under the EF-HC algorithm (Alg. alg:efhc), the information flow graph $\mathcal{G}'^{(k)}$ is $B$-connected, i.e., $\mathcal{G}'^{( k : k+B-1 )} = {( \mathcal{M}, \cup_{s=0}^{B-1}{\mathcal{E}'^{( k+s )}} )}$ is connected for any $k \geq 0$, where $B = ( \tilde{l}+2

Figures (8)

Figure 1: System diagram of a time-varying decentralized system, illustrating the four events of Alg. \ref{['alg:efhc']}, namely (i) neighbor connection, (ii) model broadcast, (iii) model aggregation, and (iv) stochastic gradient descent.
Figure 2: Performance comparison between our method ( EF-HC), global threshold (GT), zero threshold (ZT), and randomized gossip (RG) algorithms on (a) FMNIST and (b) FEMNIST datasets using an SVM model. The resources are allocated to devices using a uniform distribution $\mathcal{U}{\left( (1-\sigma_N)b_M, (1+\sigma_N)b_M \right)}$ under a random geometric graph. The plots show (i) transmission time per iteration, (ii) accuracy per iteration, (iii) accuracy per transmission time, and (iv) accuracy after a certain number of transmissions with respect to graph connectivity. For this figure, the link bandwidths among devices are generated using a uniform distribution $\mathcal{U}{\left( (1-\sigma_N)b_M, (1+\sigma_N)b_M \right)}$. The devices themselves are connected to each other via random geometric graph. We see how our EF-HC algorithm achieves higher accuracies with less transmission time passed in Figs. \ref{['fig:sim:svm_fmnist']}-(iii) and \ref{['fig:sim:svm_femnist']}-(iii), and also how its advantage remains consistent across different graph connectivities in Figs. \ref{['fig:sim:svm_fmnist']}-(iv) and \ref{['fig:sim:svm_femnist']}-(iv).
Figure 3: Performance comparison between our method ( EF-HC) and the baselines using the FMNIST dataset. For this figure, the link bandwidths among devices are generated using a beta distribution $\mathop{\mathrm{Beta}}\limits(0.5, 0.5) \cdot b_M$. The devices in Figs. \ref{['fig:sim_new:svm_beta']} and \ref{['fig:sim_new:svm_internet']} are connected to each other via a random geometric graph and the Internet graph, respectively. We observe that regardless of the network topology, our EF-HC algorithm achieves higher accuracies faster in terms of the total transmission time passed. Also, comparing Fig. \ref{['fig:sim:svm_fmnist']} to Fig. \ref{['fig:sim_new:svm_beta']}, we observe that our proposed methodology outperforms the baselines for both uniform and beta distribution, which are used to sample the link bandwidths. (Note that the connectivity of the Internet graph is fixed and cannot be varied as for the random geometric graph.)
Figure 4: Performance comparison between our method ( EF-HC) and the baselines using a CNN classifier on the FMNIST dataset. We use a random geometric graph as the network topology, and sample the link bandwidths in two different ways: (a) uniform distribution $\mathcal{U}{\left( (1-\sigma_N)b_M, (1+\sigma_N)b_M \right)}$ and (b) beta distribution $\mathop{\mathrm{Beta}}\limits(0.5, 0.5) \cdot b_M$. We observe that the superiority of our EF-HC algorithm holds when using a non-convex model as well.
Figure 5: The illustration of definitions and lemmas given for the proof of Proposition \ref{['proposition:conn']} in Appendix \ref{['appendix:proposition:conn']}.
...and 3 more figures

Theorems & Definitions (55)

Proposition 3.6
proof
Lemma 3.7
proof
Definition 3.8
Lemma 3.9
proof
Lemma 3.10
proof
Lemma 3.11
...and 45 more

Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

TL;DR

Abstract

Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (55)