FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning
Yousef Alsenani, Rahul Mishra, Khaled R. Ahmed, Atta Ur Rahman
TL;DR
This work targets non-i.i.d. data and device constraints in federated learning by introducing FedSiKD, a framework that first collects data distribution statistics from clients and then forms similarity-based clusters for localized knowledge distillation. Within each cluster, a leader model acts as a teacher to student models, transferring knowledge while reducing on-device computation and communication. The authors provide convergence analysis showing reduced intra-cluster variance and demonstrate strong empirical gains on MNIST and HAR under Dirichlet non-i.i.d. settings, including up to 25% higher accuracy on HAR (α=0.1) and 18% on MNIST (α=0.5), plus 17-20% improvement in the first five rounds. FedSiKD thus offers rapid, privacy-conscious learning suitable for resource-constrained and highly heterogeneous federated environments, with practical implications for real-world deployments.
Abstract
In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of communication rounds can be risky and potentially unsafe for model exploitation. Traditional FL approaches may suffer from these challenges. Therefore, we introduce FedSiKD, which incorporates knowledge distillation (KD) within a similarity-based federated learning framework. As clients join the system, they securely share relevant statistics about their data distribution, promoting intra-cluster homogeneity. This enhances optimization efficiency and accelerates the learning process, effectively transferring knowledge between teacher and student models and addressing device constraints. FedSiKD outperforms state-of-the-art algorithms by achieving higher accuracy, exceeding by 25\% and 18\% for highly skewed data at $α= {0.1,0.5}$ on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (https://github.com/SimuEnv/FedSiKD)
