Federated Distillation: A Survey

Lin Li; Jianping Gou; Baosheng Yu; Lan Du; Zhang Yiand Dacheng Tao

Federated Distillation: A Survey

Lin Li, Jianping Gou, Baosheng Yu, Lan Du, Zhang Yiand Dacheng Tao

TL;DR

Federated Distillation (FD) addresses the core limitations of federated learning by transferring knowledge through logits rather than sharing full model parameters, enabling heterogeneous client models and reducing communication and privacy risks. The surveyed work categorizes FD formulations, schemes for handling data, system, and model heterogeneity, methods to mitigate client drift and catastrophic forgetting, and privacy-preserving strategies, while detailing a broad range of applications across industry, computer vision, NLP, and healthcare. It highlights public data, synthetic data, and global/local alignment as key design choices, and discusses trade-offs between communication efficiency and accuracy. The practical impact of FD lies in enabling scalable, privacy-aware distributed learning with flexible model architectures and robust performance in real-world, heterogeneous environments.

Abstract

Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. Despite its promise, FL encounters challenges such as high communication costs for large-scale models and the necessity for uniform model architectures across all clients and the server. These challenges severely restrict the practical applications of FL. To address these limitations, the integration of knowledge distillation (KD) into FL has been proposed, forming what is known as Federated Distillation (FD). FD enables more flexible knowledge transfer between clients and the server, surpassing the mere sharing of model parameters. By eliminating the need for identical model architectures across clients and the server, FD mitigates the communication costs associated with training large-scale models. This paper aims to offer a comprehensive overview of FD, highlighting its latest advancements. It delves into the fundamental principles underlying the design of FD frameworks, delineates FD approaches for tackling various challenges, and provides insights into the diverse applications of FD across different scenarios.

Federated Distillation: A Survey

TL;DR

Abstract

Paper Structure (22 sections, 7 equations, 15 figures, 6 tables)

This paper contains 22 sections, 7 equations, 15 figures, 6 tables.

Introduction
Background
Federated Learning
Knowledge Distillation
FD Formulation
Problem Description
Main Framework
FD Schemes
FD for Heterogeneity
Data Heterogeneity
System Heterogeneity
Model Heterogeneity
FD for Client-drift
FD for Catastrophic Forgetting
FD for Communication Costs
...and 7 more sections

Figures (15)

Figure 1: An overview of the organization of the different sections in this paper.
Figure 2: An overview of three main federated learning categories. (a) Horizontal Federated Learning. Step 1: Download the trained global model and repeat the training cycle for use on each client node. Step 2: Homogeneous clients from the same domain for training the global model utilizing private data. (b) Vertical Federated Learning. Step 1: Download the trained global model and repeat the training cycle for use on each client node. Step 2: Heterogeneous clients assist in training the global model by sharing encrypted local model updates. (c) Federated Transfer Learning. Step 1: Train the global model with a heterogeneous client in the encrypted state, similar to VFL. Step 2: Obtaining personalized local models from global model through transfer learning.
Figure 3: Illustration of the vanilla federated averaging framework.
Figure 4: The classical KD framework.
Figure 5: The main FD framework.
...and 10 more figures

Federated Distillation: A Survey

TL;DR

Abstract

Federated Distillation: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (15)