Decentralized Federated Learning Over Imperfect Communication Channels

Weicai Li; Tiejun Lv; Wei Ni; Jingbo Zhao; Ekram Hossain; H. Vincent Poor

Decentralized Federated Learning Over Imperfect Communication Channels

Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor

TL;DR

This work analyzes decentralized federated learning (D-FL) over imperfect communication channels, deriving a bias in locally aggregated models and establishing a convergence upper bound that depends on both network topology and channel errors. By introducing matrices that decouple the impact of topology, channel quality, and local aggregations, the authors show how the optimal number of local aggregations $J^*$ can be identified via bound minimization, with and without a-priori channel knowledge. They formulate a sequence of optimization problems to find $J^*$ (and provide a one-dimensional search algorithm) and validate the theory on CNN-F-MNIST and ResNet-18-F-CIFAR100, demonstrating up to around $12.5$ percent training-accuracy gains over baselines when channel conditions are unknown. The results highlight the practical benefit of topology-aware, channel-aware aggregation strategies in large-scale, bandwidth-limited networks and motivate future work on mobility and adaptive model selection under fading channels.

Abstract

This paper analyzes the impact of imperfect communication channels on decentralized federated learning (D-FL) and subsequently determines the optimal number of local aggregations per training round, adapting to the network topology and imperfect channels. We start by deriving the bias of locally aggregated D-FL models under imperfect channels from the ideal global models requiring perfect channels and aggregations. The bias reveals that excessive local aggregations can accumulate communication errors and degrade convergence. Another important aspect is that we analyze a convergence upper bound of D-FL based on the bias. By minimizing the bound, the optimal number of local aggregations is identified to balance a trade-off with accumulation of communication errors in the absence of knowledge of the channels. With this knowledge, the impact of communication errors can be alleviated, allowing the convergence upper bound to decrease throughout aggregations. Experiments validate our convergence analysis and also identify the optimal number of local aggregations on two widely considered image classification tasks. It is seen that D-FL, with an optimal number of local aggregations, can outperform its potential alternatives by over 10% in training accuracy.

Decentralized Federated Learning Over Imperfect Communication Channels

TL;DR

can be identified via bound minimization, with and without a-priori channel knowledge. They formulate a sequence of optimization problems to find

(and provide a one-dimensional search algorithm) and validate the theory on CNN-F-MNIST and ResNet-18-F-CIFAR100, demonstrating up to around

percent training-accuracy gains over baselines when channel conditions are unknown. The results highlight the practical benefit of topology-aware, channel-aware aggregation strategies in large-scale, bandwidth-limited networks and motivate future work on mobility and adaptive model selection under fading channels.

Abstract

Paper Structure (26 sections, 11 theorems, 89 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 11 theorems, 89 equations, 10 figures, 1 table, 2 algorithms.

Introduction
Related Work
System Model and Assumptions
D-FL Model
Local Model Training
Local Model Aggregation
Communication Model
Convergence analysis of D-FL under imperfect channel
Optimal Aggregation Schedule of D-FL
Numerical Results
CNN on F-MNIST
ResNet-18 on F-CIFAR100
Conclusion
Proof of Lemma \ref{['theo1']}
Proof of Lemma \ref{['lemma1']}
...and 11 more sections

Key Result

Proposition 1

To allow the locally aggregated models of D-FL to converge to $\boldsymbol{\bar{\omega}}_I^{t}$, the local model of device $n$ used for local aggregations in the $t$-th round is initialized by $\mathbf{x}_{n,0}^{t}=Np_{n}\boldsymbol{\omega}_{n,I}^{t}$; or in other words, where $\mathrm{diag}(\mathbf{p})$ is the diagonal matrix with $\mathbf{p}$ along diagonal.

Figures (10)

Figure 1: The timeline of the $t$-th training round of D-FL.
Figure 2: The training accuracy of D-FL versus the training rounds, where the CNN model and non-i.i.d. F-MNIST dataset. $N=10$, $\rho=0.5$, and $\kappa=1$. The C-FL is also considered with device $4$ serving as the central aggregator.
Figure 3: The upper and lower bounds of $\Phi$ under the CNN model, where the default setting is considered: $N=10$ and $\rho=0.5$.
Figure 4: The impact of the number of devices under the CNN model.
Figure 5: The impact of connection density under the CNN model.
...and 5 more figures

Theorems & Definitions (23)

Proposition 1
proof
Definition 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 1
...and 13 more

Decentralized Federated Learning Over Imperfect Communication Channels

TL;DR

Abstract

Decentralized Federated Learning Over Imperfect Communication Channels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (23)