FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning
Jordan Slessor, Dezheng Kong, Xiaofen Tang, Zheng En Than, Linglong Kong
TL;DR
FedSTaS introduces a dual-strategy for federated learning: stratifying clients by compressed gradient similarity and performing data-level sampling with privacy-preserving sizing. By combining Neyman-allocated client sampling, gradient-norm-based importance sampling, and DP-enabled data sampling, FedSTaS achieves unbiased aggregation, reduced variance, and faster convergence under heterogeneous data. Empirical results on MNIST and CIFAR-100 show FedSTaS outperforms FedSTS in accuracy and speed, with the DP-enabled variant maintaining competitive performance while providing privacy guarantees of $\epsilon=3$. The work demonstrates practical gains for efficient and private FL, and outlines avenues for theoretical convergence analysis and enhanced data-level sampling strategies.
Abstract
Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how to sample participating clients in each round effectively and in a privacy-preserving manner. In this paper, we propose \textit{FedSTaS}, a client and data-level sampling method inspired by \textit{FedSTS} and \textit{FedSampling}. In each federated learning round, \textit{FedSTaS} stratifies clients based on their compressed gradients, re-allocate the number of clients to sample using an optimal Neyman allocation, and sample local data from each participating clients using a data uniform sampling strategy. Experiments on three datasets show that \textit{FedSTaS} can achieve higher accuracy scores than those of \textit{FedSTS} within a fixed number of training rounds.
