Table of Contents
Fetching ...

FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning

Jordan Slessor, Dezheng Kong, Xiaofen Tang, Zheng En Than, Linglong Kong

TL;DR

FedSTaS introduces a dual-strategy for federated learning: stratifying clients by compressed gradient similarity and performing data-level sampling with privacy-preserving sizing. By combining Neyman-allocated client sampling, gradient-norm-based importance sampling, and DP-enabled data sampling, FedSTaS achieves unbiased aggregation, reduced variance, and faster convergence under heterogeneous data. Empirical results on MNIST and CIFAR-100 show FedSTaS outperforms FedSTS in accuracy and speed, with the DP-enabled variant maintaining competitive performance while providing privacy guarantees of $\epsilon=3$. The work demonstrates practical gains for efficient and private FL, and outlines avenues for theoretical convergence analysis and enhanced data-level sampling strategies.

Abstract

Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how to sample participating clients in each round effectively and in a privacy-preserving manner. In this paper, we propose \textit{FedSTaS}, a client and data-level sampling method inspired by \textit{FedSTS} and \textit{FedSampling}. In each federated learning round, \textit{FedSTaS} stratifies clients based on their compressed gradients, re-allocate the number of clients to sample using an optimal Neyman allocation, and sample local data from each participating clients using a data uniform sampling strategy. Experiments on three datasets show that \textit{FedSTaS} can achieve higher accuracy scores than those of \textit{FedSTS} within a fixed number of training rounds.

FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning

TL;DR

FedSTaS introduces a dual-strategy for federated learning: stratifying clients by compressed gradient similarity and performing data-level sampling with privacy-preserving sizing. By combining Neyman-allocated client sampling, gradient-norm-based importance sampling, and DP-enabled data sampling, FedSTaS achieves unbiased aggregation, reduced variance, and faster convergence under heterogeneous data. Empirical results on MNIST and CIFAR-100 show FedSTaS outperforms FedSTS in accuracy and speed, with the DP-enabled variant maintaining competitive performance while providing privacy guarantees of . The work demonstrates practical gains for efficient and private FL, and outlines avenues for theoretical convergence analysis and enhanced data-level sampling strategies.

Abstract

Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how to sample participating clients in each round effectively and in a privacy-preserving manner. In this paper, we propose \textit{FedSTaS}, a client and data-level sampling method inspired by \textit{FedSTS} and \textit{FedSampling}. In each federated learning round, \textit{FedSTaS} stratifies clients based on their compressed gradients, re-allocate the number of clients to sample using an optimal Neyman allocation, and sample local data from each participating clients using a data uniform sampling strategy. Experiments on three datasets show that \textit{FedSTaS} can achieve higher accuracy scores than those of \textit{FedSTS} within a fixed number of training rounds.

Paper Structure

This paper contains 13 sections, 3 theorems, 12 equations, 3 figures, 3 algorithms.

Key Result

Lemma 2.1

Let $\bm w_{t+1}$ be computed via Algorithm alg: FedSTaS. Then where $W(\mathcal{K})$ denotes the model aggregation computed with all clients.

Figures (3)

  • Figure 1: Visualization of data partition for Dirichlet distribution with $\alpha = 0.01$ and IID settings.
  • Figure 2: Non-Convex model with $q = 0.1$, $n_{\text{SGD}} = 3$, $\eta = 0.01$, $B = 128$ on MNIST, with $\alpha = 0.01$, $n_{\text{iter}} = 99$, $K_{\text{desired}} = 2048$, $d' = 9$, $M = 100$, and $\alpha_{\text{dp}} = 0.1616$ (DP Privacy = 3).
  • Figure 3: Non-Convex model with $q = 0.1$, $n_{\text{SGD}} = 3$, $\eta = 0.01$, $B = 128$ on MNIST and CIFAR-100 with varying $\alpha$ values (0.01 and 0.001). The desired dimension is $K_{\text{desired}} = 2048$, $d' = 9$, $M = 100$, and $\alpha_{\text{dp}} = 0.1616$ ensures DP Privacy $\epsilon = 3$.

Theorems & Definitions (4)

  • Lemma 2.1: Unbiased-ness
  • Lemma 2.2
  • Definition 2.3: $\epsilon$-LDP
  • Lemma 2.4