GPT-FL: Generative Pre-trained Model-Assisted Federated Learning
Tuo Zhang, Tiantian Feng, Samiul Alam, Dimitrios Dimitriadis, Sunwoo Lee, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr
TL;DR
GPT-FL presents a decoupled federated learning framework that uses prompts from label names to generate diversified synthetic data via pre-trained generative models, trains a downstream model on the server with this data, and then federates fine-tuning with private client data. The approach achieves superior accuracy, lower communication costs, and improved client-sampling efficiency across image and audio modalities, while remaining compatible with secure aggregation and requiring no extra FL hyperparameters. Theoretical analysis shows synthetic-data pre-training biases gradients toward the synthetic distribution, reducing variance and accelerating convergence, with empirical results corroborating faster training and better generalization. Overall, GPT-FL offers a practical, versatile enhancement to FL by leveraging foundation models for data augmentation and server-side pre-training, applicable across diverse data modalities and tasks.
Abstract
In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis across various data modalities, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data. The code is available at https://github.com/AvestimehrResearchGroup/GPT-FL.
