Table of Contents
Fetching ...

Knowledge Distillation for Federated Learning: a Practical Guide

Alessio Mora, Irene Tenison, Paolo Bellavista, Irina Rish

TL;DR

This paper surveys knowledge-distillation (KD) based approaches tailored for federated learning (FL), focusing on overcoming FedAvg-like limitations such as model heterogeneity, data non-IIDness, and communication costs. It introduces a taxonomy of KD-based FL methods, detailing server-side and client-side mechanisms, including data-free generation and intermediate-feature sharing, and analyzes their trade-offs, scalability, and privacy implications. The review covers enhancements to FedAvg via ensemble distillation, federated adaptations of codistillation, and regularization strategies, with extensions to personalization, unlearning, and class-incremental learning. The work provides adoption guidelines and highlights promising directions for practical deployment and future research in KD-enabled FL.

Abstract

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits, i.e., model homogeneity, high communication cost, poor performance in presence of heterogeneous data distributions. Federated adaptations of regular Knowledge Distillation (KD) can solve or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we originally present a focused review of the state-of-the-art KD-based algorithms specifically tailored for FL, by providing both a novel classification of the existing approaches and a detailed technical description of their pros, cons, and tradeoffs.

Knowledge Distillation for Federated Learning: a Practical Guide

TL;DR

This paper surveys knowledge-distillation (KD) based approaches tailored for federated learning (FL), focusing on overcoming FedAvg-like limitations such as model heterogeneity, data non-IIDness, and communication costs. It introduces a taxonomy of KD-based FL methods, detailing server-side and client-side mechanisms, including data-free generation and intermediate-feature sharing, and analyzes their trade-offs, scalability, and privacy implications. The review covers enhancements to FedAvg via ensemble distillation, federated adaptations of codistillation, and regularization strategies, with extensions to personalization, unlearning, and class-incremental learning. The work provides adoption guidelines and highlights promising directions for practical deployment and future research in KD-enabled FL.

Abstract

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits, i.e., model homogeneity, high communication cost, poor performance in presence of heterogeneous data distributions. Federated adaptations of regular Knowledge Distillation (KD) can solve or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we originally present a focused review of the state-of-the-art KD-based algorithms specifically tailored for FL, by providing both a novel classification of the existing approaches and a detailed technical description of their pros, cons, and tradeoffs.
Paper Structure (27 sections, 3 equations, 3 figures, 2 tables)

This paper contains 27 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Taxonomy of KD-based solutions for FL issues.
  • Figure 2: KD-based solutions for FL issues. (a) FedAvg, (b) statistic-based federated CD, (c) response-based federated CD.
  • Figure 3: Local-global distillation using a regularization term. $w_{t}$ represents the global model at round $t$. $w_{t+1}^k$ is the local model.