Table of Contents
Fetching ...

DFML: Decentralized Federated Mutual Learning

Yasser H. Khalil, Amir H. Estiri, Mahdi Beitollahi, Nader Asadi, Sobhan Hemati, Xu Li, Guojun Zhang, Xi Chen

TL;DR

DFML tackles the challenges of decentralized federated learning by enabling serverless mutual learning among heterogeneous clients without relying on public data. It introduces a joint objective $L = (1-\alpha)L_{WSM} + \alpha L_{KL}$, with a cyclic schedule for $\alpha^{(t)}$ to balance supervision and distillation, and peak models $\widehat{W}_n$ to stabilize global knowledge. The approach supports nonrestrictive heterogeneity and demonstrates superior convergence speed and global accuracy across IID and non-IID settings, outperforming decentralized baselines on multiple datasets and architectures. This work offers a scalable, privacy-preserving alternative for real-world deployments where central servers are impractical or undesirable and data/model heterogeneity is intrinsic.

Abstract

In the realm of real-world devices, centralized servers in Federated Learning (FL) present challenges including communication bottlenecks and susceptibility to a single point of failure. Additionally, contemporary devices inherently exhibit model and data heterogeneity. Existing work lacks a Decentralized FL (DFL) framework capable of accommodating such heterogeneity without imposing architectural restrictions or assuming the availability of public data. To address these issues, we propose a Decentralized Federated Mutual Learning (DFML) framework that is serverless, supports nonrestrictive heterogeneous models, and avoids reliance on public data. DFML effectively handles model and data heterogeneity through mutual learning, which distills knowledge between clients, and cyclically varying the amount of supervision and distillation signals. Extensive experimental results demonstrate consistent effectiveness of DFML in both convergence speed and global accuracy, outperforming prevalent baselines under various conditions. For example, with the CIFAR-100 dataset and 50 clients, DFML achieves a substantial increase of +17.20% and +19.95% in global accuracy under Independent and Identically Distributed (IID) and non-IID data shifts, respectively.

DFML: Decentralized Federated Mutual Learning

TL;DR

DFML tackles the challenges of decentralized federated learning by enabling serverless mutual learning among heterogeneous clients without relying on public data. It introduces a joint objective , with a cyclic schedule for to balance supervision and distillation, and peak models to stabilize global knowledge. The approach supports nonrestrictive heterogeneity and demonstrates superior convergence speed and global accuracy across IID and non-IID settings, outperforming decentralized baselines on multiple datasets and architectures. This work offers a scalable, privacy-preserving alternative for real-world deployments where central servers are impractical or undesirable and data/model heterogeneity is intrinsic.

Abstract

In the realm of real-world devices, centralized servers in Federated Learning (FL) present challenges including communication bottlenecks and susceptibility to a single point of failure. Additionally, contemporary devices inherently exhibit model and data heterogeneity. Existing work lacks a Decentralized FL (DFL) framework capable of accommodating such heterogeneity without imposing architectural restrictions or assuming the availability of public data. To address these issues, we propose a Decentralized Federated Mutual Learning (DFML) framework that is serverless, supports nonrestrictive heterogeneous models, and avoids reliance on public data. DFML effectively handles model and data heterogeneity through mutual learning, which distills knowledge between clients, and cyclically varying the amount of supervision and distillation signals. Extensive experimental results demonstrate consistent effectiveness of DFML in both convergence speed and global accuracy, outperforming prevalent baselines under various conditions. For example, with the CIFAR-100 dataset and 50 clients, DFML achieves a substantial increase of +17.20% and +19.95% in global accuracy under Independent and Identically Distributed (IID) and non-IID data shifts, respectively.
Paper Structure (54 sections, 10 equations, 29 figures, 12 tables, 3 algorithms)

This paper contains 54 sections, 10 equations, 29 figures, 12 tables, 3 algorithms.

Figures (29)

  • Figure 1: Demonstrating the adverse impact of model and data heterogeneity on global accuracy using decentralized FedAvg. The experiment uses CIFAR-100 dataset with 50 clients. Homogeneous models and IID data signify clients with identical model architectures and data distributions. In contrast, heterogeneous models and non-IID data indicate variations in both model architectures and data distributions among clients. Additional experimental details can be found in Section \ref{['model_data_heterogenity']}.
  • Figure 2: Our proposed DFML framework. In each communication round $t$, randomly selected clients (senders) send their locally trained models $W_n$ to another randomly chosen client (aggregator). Mutual learning takes place at the aggregator using $\alpha^{(t)}$. The updated models $W^+_n$ and $\alpha^{(t)}$ are then transmitted back to the senders. $\alpha^{(t)}$ controls the impact of the loss components in the objective function (see Section \ref{['sec_proposed_approach']}), and is computed based on a scheduler function. $t$ denotes the current communication round. Different shapes and sizes signify model and data heterogeneity. In this example, clients 2 and 4 act as senders, while client 1 serves as the aggregator.
  • Figure 3: Illustrating the impact of cyclically varying $\alpha$ on global accuracy. Peak models are updated up to the first $\alpha$ maximum and every subsequent time $\alpha$ reaches its maximum limit. In this example, $\alpha$ is varied using a cosine annealing scheduler.
  • Figure 4: Demonstrating the global accuracy gain DFML achieves in comparison with decentralized FedAvg and HeteroFL under model and data heterogeneity. CIFAR-100 dataset is used with $50$ clients and CNN architectures.
  • Figure 5: Performance comparison between the different architecture clusters in both DFML and decentralized FedAvg. In this experiment, five nonrestrictive heterogeneous architectures are distributed among 50 clients. C0, C1, C2, C3, and C4 represent the global accuracy average of all models with CNN architectures [32, 64, 128, 256], [32, 64, 128], [32, 64], [16, 32, 64], and [8, 16, 32, 64], respectively.
  • ...and 24 more figures