LEFL: Low Entropy Client Sampling in Federated Learning
Waqwoya Abebe, Pablo Munoz, Ali Jannesari
TL;DR
The paper tackles data heterogeneity in federated learning by proposing LEFL, a privacy preserving approach that clusters clients based on high level features learned by their local models. A one time preprocessing step uses a small unlabeled public dataset to generate soft labels, from which a pairwise KL divergence matrix guides unsupervised clustering into strata and enables stratified sampling in every round. Empirical results on CIFAR-10/100 and EMNIST show LEFL reduces gradient noise, speeds up convergence, and lowers communication overhead, with improvements up to 7.4 percentage points in accuracy and substantial round reductions. The method achieves these gains without exposing private data, at the cost of computing and communicating soft labels and a public dataset, offering a practical privacy preserving enhancement to FL training.
Abstract
Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we propose LEFL, an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.
