Table of Contents
Fetching ...

Distilled One-Shot Federated Learning

Yanlin Zhou, George Pu, Xiyao Ma, Xiaolin Li, Dapeng Wu

TL;DR

This work addresses the heavy communication burden and data heterogeneity of federated learning by introducing Distilled One-Shot Federated Learning (DOSFL), which compresses each client's private data into a small set of synthetic examples and trains a global model in a single round. DOSFL leverages dataset distillation with soft labels and adds soft resets and random masking to handle non-IID data, achieving up to three orders of magnitude reduction in communication while retaining 93–99% of centralized performance under IID conditions. The approach is validated across vision and language tasks (CNN, LSTM, Transformer), and a robustness variant (LP-DOSFL) demonstrates improved performance with stragglers and partial participation. Privacy analysis indicates distilled data do not easily reveal initialization information, while acknowledging potential security risks that can be mitigated with standard DP/MPC techniques.

Abstract

Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.

Distilled One-Shot Federated Learning

TL;DR

This work addresses the heavy communication burden and data heterogeneity of federated learning by introducing Distilled One-Shot Federated Learning (DOSFL), which compresses each client's private data into a small set of synthetic examples and trains a global model in a single round. DOSFL leverages dataset distillation with soft labels and adds soft resets and random masking to handle non-IID data, achieving up to three orders of magnitude reduction in communication while retaining 93–99% of centralized performance under IID conditions. The approach is validated across vision and language tasks (CNN, LSTM, Transformer), and a robustness variant (LP-DOSFL) demonstrates improved performance with stragglers and partial participation. Privacy analysis indicates distilled data do not easily reveal initialization information, while acknowledging potential security risks that can be mitigated with standard DP/MPC techniques.

Abstract

Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.

Paper Structure

This paper contains 21 sections, 6 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Distilled One-Shot Federated Learning. (1) The server initializes a model which is broadcast to all clients. (2) Each client distills their private dataset and (3) transmits synthetic data, labels and learning rates to the server. (4) The server fits its model on the distilled data and (5) distributes the final model to all clients.
  • Figure 2: Performance of LP-DOSFL with low participation on Federated MNIST, with soft resets and soft labels, vs. the number of clients distilled.
  • Figure 3: Privacy and security analysis of DOSFL.
  • Figure 4: First step of distilled images from 1 out of 10 clients for IID federated MNIST with no additions (i.e. soft labels, soft resets, random masking).
  • Figure 5: First step of distilled images from 1 out of 10 clients for non-IID federated MNIST with no additions (i.e. soft labels, soft resets, random masking).
  • ...and 2 more figures