FedD2S: Personalized Data-Free Federated Knowledge Distillation

Kawa Atapour; S. Jamal Seyedmohammadi; Jamshid Abouei; Arash Mohammadi; Konstantinos N. Plataniotis

FedD2S: Personalized Data-Free Federated Knowledge Distillation

Kawa Atapour, S. Jamal Seyedmohammadi, Jamshid Abouei, Arash Mohammadi, Konstantinos N. Plataniotis

TL;DR

FedD2S tackles data heterogeneity in personalized federated learning by introducing a data-free, two-phase mutual knowledge distillation framework with a novel deep-to-shallow layer-dropping mechanism. Local models progressively drop deeper layers from participating in federation, preserving personalized knowledge while enabling a global head to distill partial representations back to clients. The server aggregates knowledge without any public dataset by using head models to convert intermediate representations into soft labels and enforcing KL-divergence and cross-entropy losses on both distillation directions. Empirical results across FEMNIST, CIFAR10, CINIC10, and CIFAR100 show faster convergence and improved fairness compared to multiple baselines, with sensitivity analyses revealing the effects of layer-dropping rate, dropping set, and data heterogeneity on performance. The approach provides a practical, privacy-preserving path to robust personalization in federated settings with heterogeneous client data.

Abstract

This paper addresses the challenge of mitigating data heterogeneity among clients within a Federated Learning (FL) framework. The model-drift issue, arising from the noniid nature of client data, often results in suboptimal personalization of a global model compared to locally trained models for each client. To tackle this challenge, we propose a novel approach named FedD2S for Personalized Federated Learning (pFL), leveraging knowledge distillation. FedD2S incorporates a deep-to-shallow layer-dropping mechanism in the data-free knowledge distillation process to enhance local model personalization. Through extensive simulations on diverse image datasets-FEMNIST, CIFAR10, CINIC0, and CIFAR100-we compare FedD2S with state-of-the-art FL baselines. The proposed approach demonstrates superior performance, characterized by accelerated convergence and improved fairness among clients. The introduced layer-dropping technique effectively captures personalized knowledge, resulting in enhanced performance compared to alternative FL models. Moreover, we investigate the impact of key hyperparameters, such as the participation ratio and layer-dropping rate, providing valuable insights into the optimal configuration for FedD2S. The findings demonstrate the efficacy of adaptive layer-dropping in the knowledge distillation process to achieve enhanced personalization and performance across diverse datasets and tasks.

FedD2S: Personalized Data-Free Federated Knowledge Distillation

TL;DR

Abstract

Paper Structure (16 sections, 13 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 13 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Problem Statement
Knowledge Distillation
Methodology: FedD2S Algorithm
Clients-to-Server Distillation
Local Knowledge Extraction
Local Knowledge Transferring
Server-to-Client Distillation
Global Knowledge Extraction
Global Knowledge Transferring
Simulation Results
Simulation Setup
Simulation Results and Performance Analysis
Sensitivity Analysis
...and 1 more sections

Figures (7)

Figure 1: Illustration of the proposed FedD2S workflow.
Figure 2: Illustration of data heterogeneity among 10 clients on the CIFAR-10 dataset, where the x-axis shows client IDs, the y-axis indicates class IDs, and the size of squares indicates the number of training samples available for each class per client. For comparison, the number of samples for a square is reported in the left figure.
Figure 3: Learning curves of average UA (%) of the proposed FedD2S compared to baseline methods across different datasets, with $\rho=0.2$, $\alpha=0.1$, and $Z_0=3$.
Figure 4: Comparison of client distribution across accuracy ranges for different datasets—FEMNIST, CIFAR10, CINIC10, and CIFAR100—under the conditions $\alpha=0.1$ and $\rho=0.2$.
Figure 5: The influence of layer-dropping dynamics with varying epochs across different datasets.
...and 2 more figures

FedD2S: Personalized Data-Free Federated Knowledge Distillation

TL;DR

Abstract

FedD2S: Personalized Data-Free Federated Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)