Federated Generative Learning with Foundation Models
Jie Zhang, Xiaohua Qi, Bo Zhao
TL;DR
Federated Generative Learning (FGL) reframes federated training by exporting privacy-friendly text embeddings from clients to a server-equipped foundation diffusion model, which then synthesizes a substitute training set for centralized model training. This approach reduces communication rounds, mitigates data heterogeneity, and provides strong privacy assurances, demonstrated across 12 diverse datasets including ImageNet subsets, DomainNet, medical, and satellite data. The paper shows that one-shot training on synthetic data can outperform traditional FedAvg with hundreds of rounds in many settings, while limited multi-round variants with synthetic-data fine-tuning further boost performance for highly skewed distributions. Overall, FGL offers a practical path to scalable, privacy-preserving FL by leveraging prompt-driven data synthesis on powerful foundation models, with thorough ablations on prompts, generators, and data-domain challenges.
Abstract
Existing approaches in Federated Learning (FL) mainly focus on sending model parameters or gradients from clients to a server. However, these methods are plagued by significant inefficiency, privacy, and security concerns. Thanks to the emerging foundation generative models, we propose a novel federated learning framework, namely Federated Generative Learning. In this framework, each client can create text embeddings that are tailored to their local data, and send embeddings to the server. Then the informative training data can be synthesized remotely on the server using foundation generative models with these embeddings, which can benefit FL tasks. Our proposed framework offers several advantages, including increased communication efficiency, robustness to data heterogeneity, substantial performance improvements, and enhanced privacy protection. We validate these benefits through extensive experiments conducted on 12 datasets. For example, on the ImageNet100 dataset with a highly skewed data distribution, our method outperforms FedAvg by 12% in a single communication round, compared to FedAvg's performance over 200 communication rounds. We have released the code for all experiments conducted in this study.
