Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin; Tianqing Zhu; Wanlei Zhou; Philip S. Yu

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, Philip S. Yu

TL;DR

This survey addresses the challenge of enabling privacy-preserving, efficient, and personalized learning across distributed clients by applying Knowledge Distillation (KD) within Federated Learning (FL). It uncovers a taxonomy of KD-based FL methods—Feature-based, Parameter-based, and Data-based FD—and analyzes how each transfers knowledge via logits or distilled data rather than full parameters, reducing privacy risks and communication costs. The paper systematically links KD properties to FL challenges, demonstrating how KD can mitigate privacy leakage, non-IID data issues, communication bottlenecks, and model personalization, while also highlighting practical trade-offs and open problems such as teacher credibility, data dependency, and data-free distillation. The work provides a comprehensive framework for designing KD-based FL systems, offering guidance for researchers and practitioners on choosing architectures, data strategies, and deployment scenarios to balance privacy, efficiency, and personalization in real-world distributed learning scenarios.

Abstract

Federated Learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while keeping the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been widely applied in FL since 2020. KD is a validated and efficacious model compression and enhancement algorithm. The core concept of KD involves facilitating knowledge transfer between models by exchanging logits at intermediate or output layers. These properties make KD an excellent solution for the long-lasting challenges in FL. Up to now, there have been few reviews that summarize and analyze the current trend and methods for how KD can be applied in FL efficiently. This article aims to provide a comprehensive survey of KD-based FL, focusing on addressing the above challenges. First, we provide an overview of KD-based FL, including its motivation, basics, taxonomy, and a comparison with traditional FL and where KD should execute. We also analyze the critical factors in KD-based FL in the appendix, including teachers, knowledge, data, and methods. We discuss how KD can address the challenges in FL, including privacy protection, data heterogeneity, communication efficiency, and personalization. Finally, we discuss the challenges facing KD-based FL algorithms and future research directions. We hope this survey can provide insights and guidance for researchers and practitioners in the FL area.

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

TL;DR

Abstract

Paper Structure (51 sections, 2 equations, 9 figures, 5 tables)

This paper contains 51 sections, 2 equations, 9 figures, 5 tables.

Introduction
Background
Federated Learning
Knowledge Distillation
Long-lasting Challenges in FL
KD-based FL
Knowledge-distillation-based Federated Learning
KD-based FL: Motivation
How KD tackles the challenges of FL
Properties of KD
KD-based FL: Taxonomy
Feature-based FD (No parameter sharing)
Parameter-based FD (Sharing Parameters)
Data-based FD (Based on dataset distillation)
KD-based FL: Comparison with traditional FL
...and 36 more sections

Figures (9)

Figure 1: The typical training process of knowledge distillation
Figure 2: The typical training process of federated learning consists of ① model distribution, ② local model update, and ③ global model update.
Figure 3: KD uses a teacher-student architecture. Teacher model transfers knowledge to the student model.
Figure 4: The FL process consists of six steps: ① preprocessing of the global model by clients, ② local training by clients, ③ further processing of local models by clients, ④ preprocessing of local models by the server upon receiving them, ⑤ aggregation of local models by the server to obtain the global model, and ⑥ further processing of the global model by the server. KD can be employed in all six steps.
Figure 5: Comparison of the three KD-based FL methods. Feature-based FD shares model features, parameter-based FD shares model parameters, and data-based FD shares local compressed dataset.
...and 4 more figures

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

TL;DR

Abstract

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Authors

TL;DR

Abstract

Table of Contents

Figures (9)