FedMD: Heterogenous Federated Learning via Model Distillation
Daliang Li, Junpu Wang
TL;DR
This paper tackles the challenge of heterogeneous federated learning where participants must design their own models. It introduces FedMD, a framework that uses transfer learning and knowledge distillation to enable collaboration via a public dataset, without sharing private data or architectures. By exchanging logits on public data and distilling a central consensus back into each participant, FedMD achieves significant gains over isolated training and attains performance close to pooled-data baselines across MNIST/FEMNIST and CIFAR10/CIFAR100. The approach offers a practical pathway for privacy-preserving, model-diverse collaboration with broad applicability to healthcare and AI-as-a-service contexts.
Abstract
Federated learning enables the creation of a powerful centralized model without compromising data privacy of multiple participants. While successful, it does not incorporate the case where each participant independently designs its own model. Due to intellectual property concerns and heterogeneous nature of tasks and data, this is a widespread requirement in applications of federated learning to areas such as health care and AI as a service. In this work, we use transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models. We test our framework on the MNIST/FEMNIST dataset and the CIFAR10/CIFAR100 dataset and observe fast improvement across all participating models. With 10 distinct participants, the final test accuracy of each model on average receives a 20% gain on top of what's possible without collaboration and is only a few percent lower than the performance each model would have obtained if all private datasets were pooled and made directly available for all participants.
