FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

Yousef Alsenani; Rahul Mishra; Khaled R. Ahmed; Atta Ur Rahman

FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

Yousef Alsenani, Rahul Mishra, Khaled R. Ahmed, Atta Ur Rahman

TL;DR

This work targets non-i.i.d. data and device constraints in federated learning by introducing FedSiKD, a framework that first collects data distribution statistics from clients and then forms similarity-based clusters for localized knowledge distillation. Within each cluster, a leader model acts as a teacher to student models, transferring knowledge while reducing on-device computation and communication. The authors provide convergence analysis showing reduced intra-cluster variance and demonstrate strong empirical gains on MNIST and HAR under Dirichlet non-i.i.d. settings, including up to 25% higher accuracy on HAR (α=0.1) and 18% on MNIST (α=0.5), plus 17-20% improvement in the first five rounds. FedSiKD thus offers rapid, privacy-conscious learning suitable for resource-constrained and highly heterogeneous federated environments, with practical implications for real-world deployments.

Abstract

In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of communication rounds can be risky and potentially unsafe for model exploitation. Traditional FL approaches may suffer from these challenges. Therefore, we introduce FedSiKD, which incorporates knowledge distillation (KD) within a similarity-based federated learning framework. As clients join the system, they securely share relevant statistics about their data distribution, promoting intra-cluster homogeneity. This enhances optimization efficiency and accelerates the learning process, effectively transferring knowledge between teacher and student models and addressing device constraints. FedSiKD outperforms state-of-the-art algorithms by achieving higher accuracy, exceeding by 25\% and 18\% for highly skewed data at $α= {0.1,0.5}$ on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (https://github.com/SimuEnv/FedSiKD)

FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

TL;DR

Abstract

on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (https://github.com/SimuEnv/FedSiKD)

Paper Structure (30 sections, 17 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 30 sections, 17 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Data-based approach
Similarity-Based Client Clustering Approach
Knowledge Distillation and Transfer in Federated Learning
Clients resource limitation
Heterogeneous Resources
Personalization
Preliminaries
Non-i.i.d. in Federated Learning
FedSiKD
Similarity-Based Client Clustering
Convergence and Complexity Analysis
Knowledge Distillation
Clustered Knowledge Distillation
...and 15 more sections

Figures (4)

Figure 1: Local drift
Figure 2: (1) Clients share their statistics with the global server to identify an appropriate cluster. (2) The global server assigns clients to clusters. (3) Knowledge distillation and federated learning training proceed within each cluster.
Figure 3: FedSiKD: Federated Learning with Similarity-based Client Clustering and Knowledge Distillation
Figure 4: Test Accuracy for MNIST and HAR Datasets at Different Levels of Non-i.i.d Data Distribution

FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

TL;DR

Abstract

FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)