Table of Contents
Fetching ...

FedMD: Heterogenous Federated Learning via Model Distillation

Daliang Li, Junpu Wang

TL;DR

This paper tackles the challenge of heterogeneous federated learning where participants must design their own models. It introduces FedMD, a framework that uses transfer learning and knowledge distillation to enable collaboration via a public dataset, without sharing private data or architectures. By exchanging logits on public data and distilling a central consensus back into each participant, FedMD achieves significant gains over isolated training and attains performance close to pooled-data baselines across MNIST/FEMNIST and CIFAR10/CIFAR100. The approach offers a practical pathway for privacy-preserving, model-diverse collaboration with broad applicability to healthcare and AI-as-a-service contexts.

Abstract

Federated learning enables the creation of a powerful centralized model without compromising data privacy of multiple participants. While successful, it does not incorporate the case where each participant independently designs its own model. Due to intellectual property concerns and heterogeneous nature of tasks and data, this is a widespread requirement in applications of federated learning to areas such as health care and AI as a service. In this work, we use transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models. We test our framework on the MNIST/FEMNIST dataset and the CIFAR10/CIFAR100 dataset and observe fast improvement across all participating models. With 10 distinct participants, the final test accuracy of each model on average receives a 20% gain on top of what's possible without collaboration and is only a few percent lower than the performance each model would have obtained if all private datasets were pooled and made directly available for all participants.

FedMD: Heterogenous Federated Learning via Model Distillation

TL;DR

This paper tackles the challenge of heterogeneous federated learning where participants must design their own models. It introduces FedMD, a framework that uses transfer learning and knowledge distillation to enable collaboration via a public dataset, without sharing private data or architectures. By exchanging logits on public data and distilling a central consensus back into each participant, FedMD achieves significant gains over isolated training and attains performance close to pooled-data baselines across MNIST/FEMNIST and CIFAR10/CIFAR100. The approach offers a practical pathway for privacy-preserving, model-diverse collaboration with broad applicability to healthcare and AI-as-a-service contexts.

Abstract

Federated learning enables the creation of a powerful centralized model without compromising data privacy of multiple participants. While successful, it does not incorporate the case where each participant independently designs its own model. Due to intellectual property concerns and heterogeneous nature of tasks and data, this is a widespread requirement in applications of federated learning to areas such as health care and AI as a service. In this work, we use transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models. We test our framework on the MNIST/FEMNIST dataset and the CIFAR10/CIFAR100 dataset and observe fast improvement across all participating models. With 10 distinct participants, the final test accuracy of each model on average receives a 20% gain on top of what's possible without collaboration and is only a few percent lower than the performance each model would have obtained if all private datasets were pooled and made directly available for all participants.

Paper Structure

This paper contains 8 sections, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: A general framework for heterogeneous federated learning. Each agent owns a private dataset and a uniquely designed model. To communicate and collaborate without data leakage, the agents need to translate their learned knowledge to a standard format. A central server collects these knowledges, compute a consensus distributed across the network. In this work, the translator is implemented using knowledge distillation.
  • Figure 2: FedMD improves the test accuracy of participating models beyond their baselines. A dashed line (on the left) represents the test accuracy of a model after full transfer learning with the public dataset and its own small private dataset. This baseline is our starting point and overlaps with the beginning of the corresponding learning curve. A dash-dot line (on the right) represents the would-be performance of a model if private datasets from all participants were declassified and made available to every participant of the group.