FedD2S: Personalized Data-Free Federated Knowledge Distillation
Kawa Atapour, S. Jamal Seyedmohammadi, Jamshid Abouei, Arash Mohammadi, Konstantinos N. Plataniotis
TL;DR
FedD2S tackles data heterogeneity in personalized federated learning by introducing a data-free, two-phase mutual knowledge distillation framework with a novel deep-to-shallow layer-dropping mechanism. Local models progressively drop deeper layers from participating in federation, preserving personalized knowledge while enabling a global head to distill partial representations back to clients. The server aggregates knowledge without any public dataset by using head models to convert intermediate representations into soft labels and enforcing KL-divergence and cross-entropy losses on both distillation directions. Empirical results across FEMNIST, CIFAR10, CINIC10, and CIFAR100 show faster convergence and improved fairness compared to multiple baselines, with sensitivity analyses revealing the effects of layer-dropping rate, dropping set, and data heterogeneity on performance. The approach provides a practical, privacy-preserving path to robust personalization in federated settings with heterogeneous client data.
Abstract
This paper addresses the challenge of mitigating data heterogeneity among clients within a Federated Learning (FL) framework. The model-drift issue, arising from the noniid nature of client data, often results in suboptimal personalization of a global model compared to locally trained models for each client. To tackle this challenge, we propose a novel approach named FedD2S for Personalized Federated Learning (pFL), leveraging knowledge distillation. FedD2S incorporates a deep-to-shallow layer-dropping mechanism in the data-free knowledge distillation process to enhance local model personalization. Through extensive simulations on diverse image datasets-FEMNIST, CIFAR10, CINIC0, and CIFAR100-we compare FedD2S with state-of-the-art FL baselines. The proposed approach demonstrates superior performance, characterized by accelerated convergence and improved fairness among clients. The introduced layer-dropping technique effectively captures personalized knowledge, resulting in enhanced performance compared to alternative FL models. Moreover, we investigate the impact of key hyperparameters, such as the participation ratio and layer-dropping rate, providing valuable insights into the optimal configuration for FedD2S. The findings demonstrate the efficacy of adaptive layer-dropping in the knowledge distillation process to achieve enhanced personalization and performance across diverse datasets and tasks.
