Knowledge Distillation for Federated Learning: a Practical Guide

Alessio Mora; Irene Tenison; Paolo Bellavista; Irina Rish

Knowledge Distillation for Federated Learning: a Practical Guide

Alessio Mora, Irene Tenison, Paolo Bellavista, Irina Rish

TL;DR

This paper surveys knowledge-distillation (KD) based approaches tailored for federated learning (FL), focusing on overcoming FedAvg-like limitations such as model heterogeneity, data non-IIDness, and communication costs. It introduces a taxonomy of KD-based FL methods, detailing server-side and client-side mechanisms, including data-free generation and intermediate-feature sharing, and analyzes their trade-offs, scalability, and privacy implications. The review covers enhancements to FedAvg via ensemble distillation, federated adaptations of codistillation, and regularization strategies, with extensions to personalization, unlearning, and class-incremental learning. The work provides adoption guidelines and highlights promising directions for practical deployment and future research in KD-enabled FL.

Abstract

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits, i.e., model homogeneity, high communication cost, poor performance in presence of heterogeneous data distributions. Federated adaptations of regular Knowledge Distillation (KD) can solve or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we originally present a focused review of the state-of-the-art KD-based algorithms specifically tailored for FL, by providing both a novel classification of the existing approaches and a detailed technical description of their pros, cons, and tradeoffs.

Knowledge Distillation for Federated Learning: a Practical Guide

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 3 figures, 2 tables)

This paper contains 27 sections, 3 equations, 3 figures, 2 tables.

Introduction
Background
Knowledge Distillation
Codistillation
Proposed Taxonomy and Classification
Enabling FL Model Heterogeneity via KD
Enhancing FedAvg Aggregation
Federated Adaptations of Codistillation
Disclosing Aggregated Statistics of Model Responses on Local Data.
Exchanging Model Responses on Publicly Available Data.
Leveraging Intermediate Features.
Comparison and Adoption Guidelines
Tackling FL Data Heterogeneity via KD
Server-side Refinement of Global Model
Refinement on Pubicly Available Data.
...and 12 more sections

Figures (3)

Figure 1: Taxonomy of KD-based solutions for FL issues.
Figure 2: KD-based solutions for FL issues. (a) FedAvg, (b) statistic-based federated CD, (c) response-based federated CD.
Figure 3: Local-global distillation using a regularization term. $w_{t}$ represents the global model at round $t$. $w_{t+1}^k$ is the local model.

Knowledge Distillation for Federated Learning: a Practical Guide

TL;DR

Abstract

Knowledge Distillation for Federated Learning: a Practical Guide

Authors

TL;DR

Abstract

Table of Contents

Figures (3)