Table of Contents
Fetching ...

One-shot Federated Learning via Synthetic Distiller-Distillate Communication

Junyuan Zhang, Songhua Liu, Xinchao Wang

TL;DR

FedSD2C tackles the accuracy gap in one-shot federated learning by replacing inconsistent client models with synthetic distillates distilled from Core-Sets. A V-information based Core-Set selection captures diverse local information, which is privacy-protected via Fourier amplitude perturbation and refined by a pre-trained Autoencoder to align the distillates with the original data distribution. The server trains on decodings of these distillates, mitigating two-tier information loss and reducing communication costs while preserving privacy. Empirical results across multiple datasets show FedSD2C consistently outperforms existing one-shot FL methods, especially under high data heterogeneity, with strong robustness to model architecture and scalable communication efficiency.

Abstract

One-shot Federated learning (FL) is a powerful technology facilitating collaborative training of machine learning models in a single round of communication. While its superiority lies in communication efficiency and privacy preservation compared to iterative FL, one-shot FL often compromises model performance. Prior research has primarily focused on employing data-free knowledge distillation to optimize data generators and ensemble models for better aggregating local knowledge into the server model. However, these methods typically struggle with data heterogeneity, where inconsistent local data distributions can cause teachers to provide misleading knowledge. Additionally, they may encounter scalability issues with complex datasets due to inherent two-step information loss: first, during local training (from data to model), and second, when transferring knowledge to the server model (from model to inversed data). In this paper, we propose FedSD2C, a novel and practical one-shot FL framework designed to address these challenges. FedSD2C introduces a distiller to synthesize informative distillates directly from local data to reduce information loss and proposes sharing synthetic distillates instead of inconsistent local models to tackle data heterogeneity. Our empirical results demonstrate that FedSD2C consistently outperforms other one-shot FL methods with more complex and real datasets, achieving up to 2.6 the performance of the best baseline. Code: https://github.com/Carkham/FedSD2C

One-shot Federated Learning via Synthetic Distiller-Distillate Communication

TL;DR

FedSD2C tackles the accuracy gap in one-shot federated learning by replacing inconsistent client models with synthetic distillates distilled from Core-Sets. A V-information based Core-Set selection captures diverse local information, which is privacy-protected via Fourier amplitude perturbation and refined by a pre-trained Autoencoder to align the distillates with the original data distribution. The server trains on decodings of these distillates, mitigating two-tier information loss and reducing communication costs while preserving privacy. Empirical results across multiple datasets show FedSD2C consistently outperforms existing one-shot FL methods, especially under high data heterogeneity, with strong robustness to model architecture and scalable communication efficiency.

Abstract

One-shot Federated learning (FL) is a powerful technology facilitating collaborative training of machine learning models in a single round of communication. While its superiority lies in communication efficiency and privacy preservation compared to iterative FL, one-shot FL often compromises model performance. Prior research has primarily focused on employing data-free knowledge distillation to optimize data generators and ensemble models for better aggregating local knowledge into the server model. However, these methods typically struggle with data heterogeneity, where inconsistent local data distributions can cause teachers to provide misleading knowledge. Additionally, they may encounter scalability issues with complex datasets due to inherent two-step information loss: first, during local training (from data to model), and second, when transferring knowledge to the server model (from model to inversed data). In this paper, we propose FedSD2C, a novel and practical one-shot FL framework designed to address these challenges. FedSD2C introduces a distiller to synthesize informative distillates directly from local data to reduce information loss and proposes sharing synthetic distillates instead of inconsistent local models to tackle data heterogeneity. Our empirical results demonstrate that FedSD2C consistently outperforms other one-shot FL methods with more complex and real datasets, achieving up to 2.6 the performance of the best baseline. Code: https://github.com/Carkham/FedSD2C

Paper Structure

This paper contains 28 sections, 8 equations, 4 figures, 10 tables, 2 algorithms.

Figures (4)

  • Figure 1: Illustration of issues in one-shot FL based on DFKD: (1) Information loss occurs during the transfer from local data to the model and from the model back to the inversed data. (2) t-SNE plots of feature distributions of data generated by DENSE(left $\blacktriangle$), Co-Boosting(middle $\blacksquare$), and our FedSD2C(right $\bigstar$). We randomly select five different classes (indicated by different colors) of real and synthetic data from Tiny-ImageNet. Bad samples are data generated by the DFKD-based method that deviates from the distribution of local real data.
  • Figure 2: Framework of proposed FedSD2C.
  • Figure 3: (a) Experiments on the medical image data domain. Adopting pre-trained Autoencoders on other data domains can reduce performance. However, this can be mitigated by increasing $T_{syn}$. (b) Experiments of FedSD2C with randomly initialized downsampling and upsampling modules (blue line) compared to pre-trained Autoencoders (orange line) on ImageNette. Without pre-trained knowledge, FedSD2C requires a higher $T_{syn}$ for distillate synthesis but can still achieve comparable results. ResNet-18 is used for both experiments.
  • Figure S1: Visualization of synthetic distillate reconstructed by the pre-trained Autoencoder compared to the original sample on Tiny-ImageNet. The image style is similar, but with enhanced privacy protection.