Recovering Global Data Distribution Locally in Federated Learning
Ziyu Yao
TL;DR
The paper tackles label distribution skew in Federated Learning by proposing ReGL, a framework that recovers the global data distribution locally on each client. It leverages foundation generative models to synthesize images for minority and missing classes in a training-free approach, and further enhances alignment with local data through adaptive LoRA-based fine-tuning that incorporates multimodal conditioning. By combining real and synthetic data, ReGL enables FedAvg-style aggregation to achieve near-centralized performance in global generalization and superior personalization, outperforming state-of-the-art baselines across multiple datasets and skew settings. The results demonstrate robust improvement under extreme skew, missing classes, and high client counts, offering a privacy-preserving, scalable solution for FL with label distribution skew.
Abstract
Federated Learning (FL) is a distributed machine learning paradigm that enables collaboration among multiple clients to train a shared model without sharing raw data. However, a major challenge in FL is the label imbalance, where clients may exclusively possess certain classes while having numerous minority and missing classes. Previous works focus on optimizing local updates or global aggregation but ignore the underlying imbalanced label distribution across clients. In this paper, we propose a novel approach ReGL to address this challenge, whose key idea is to Recover the Global data distribution Locally. Specifically, each client uses generative models to synthesize images that complement the minority and missing classes, thereby alleviating label imbalance. Moreover, we adaptively fine-tune the image generation process using local real data, which makes the synthetic images align more closely with the global distribution. Importantly, both the generation and fine-tuning processes are conducted at the client-side without leaking data privacy. Through comprehensive experiments on various image classification datasets, we demonstrate the remarkable superiority of our approach over existing state-of-the-art works in fundamentally tackling label imbalance in FL.
