LEFL: Low Entropy Client Sampling in Federated Learning

Waqwoya Abebe; Pablo Munoz; Ali Jannesari

LEFL: Low Entropy Client Sampling in Federated Learning

Waqwoya Abebe, Pablo Munoz, Ali Jannesari

TL;DR

The paper tackles data heterogeneity in federated learning by proposing LEFL, a privacy preserving approach that clusters clients based on high level features learned by their local models. A one time preprocessing step uses a small unlabeled public dataset to generate soft labels, from which a pairwise KL divergence matrix guides unsupervised clustering into strata and enables stratified sampling in every round. Empirical results on CIFAR-10/100 and EMNIST show LEFL reduces gradient noise, speeds up convergence, and lowers communication overhead, with improvements up to 7.4 percentage points in accuracy and substantial round reductions. The method achieves these gains without exposing private data, at the cost of computing and communicating soft labels and a public dataset, offering a practical privacy preserving enhancement to FL training.

Abstract

Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we propose LEFL, an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.

LEFL: Low Entropy Client Sampling in Federated Learning

TL;DR

Abstract

Paper Structure (17 sections, 4 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 4 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Related Works
Aggregation Techniques
Sampling Techniques
Methodology
Assumption 1.
Assumption 2.
Conjecture 1:
Conjecture 2:
Evaluation
Testing Conjectures
Federated Learning Experiments
Convergence Analysis
Communication Overhead
Ablation Study
...and 2 more sections

Figures (6)

Figure 1: Server side computation after receiving soft-labels from clients (Algorithm \ref{['alg:preprocess']}). The server will construct a similarity matrix and use it to cluster the clients. Afterwards, it will conduct stratified client sampling across clusters in each FL round.
Figure 2: Comparing CKA values for every layer of a pair of randomly selected models. Models 1a and 1b belong to the same cluster, whereas model 2a comes from a different cluster. Red diagonal highlights region where model layers intersect.
Figure 3: Average Euclidean distance of latent space vectors among clients within stratified clusters vs. random clusters. Details of experiments A - F will be discussed in supplementary materials.
Figure 4: Comparing average relative entropy values of stratified samples vs. random samples over 500 communication rounds. The figure subtitles summarize the experiment settings (model - dataset - number of clients @ client sampling ratio)
Figure 5: Comparing performance of our work (LEFL) against FedAvg, FedProx FedNova and SCAFFOLD. Each experiment was conducted on a specified dataset using a predifined number of clients with a certain model and client sample ratio. The figure subtitles summarize the experiment settings (model - dataset - number of clients @ client sampling ratio).
...and 1 more figures

LEFL: Low Entropy Client Sampling in Federated Learning

TL;DR

Abstract

LEFL: Low Entropy Client Sampling in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)