Federated Distillation: A Survey
Lin Li, Jianping Gou, Baosheng Yu, Lan Du, Zhang Yiand Dacheng Tao
TL;DR
Federated Distillation (FD) addresses the core limitations of federated learning by transferring knowledge through logits rather than sharing full model parameters, enabling heterogeneous client models and reducing communication and privacy risks. The surveyed work categorizes FD formulations, schemes for handling data, system, and model heterogeneity, methods to mitigate client drift and catastrophic forgetting, and privacy-preserving strategies, while detailing a broad range of applications across industry, computer vision, NLP, and healthcare. It highlights public data, synthetic data, and global/local alignment as key design choices, and discusses trade-offs between communication efficiency and accuracy. The practical impact of FD lies in enabling scalable, privacy-aware distributed learning with flexible model architectures and robust performance in real-world, heterogeneous environments.
Abstract
Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. Despite its promise, FL encounters challenges such as high communication costs for large-scale models and the necessity for uniform model architectures across all clients and the server. These challenges severely restrict the practical applications of FL. To address these limitations, the integration of knowledge distillation (KD) into FL has been proposed, forming what is known as Federated Distillation (FD). FD enables more flexible knowledge transfer between clients and the server, surpassing the mere sharing of model parameters. By eliminating the need for identical model architectures across clients and the server, FD mitigates the communication costs associated with training large-scale models. This paper aims to offer a comprehensive overview of FD, highlighting its latest advancements. It delves into the fundamental principles underlying the design of FD frameworks, delineates FD approaches for tackling various challenges, and provides insights into the diverse applications of FD across different scenarios.
