Table of Contents
Fetching ...

Client Selection in Federated Learning with Data Heterogeneity and Network Latencies

Harsh Vardhan, Xiaofan Yu, Tajana Rosing, Arya Mazumdar

TL;DR

This work tackles the dual challenge of data heterogeneity and latency heterogeneity in federated learning by introducing two theoretically optimal client selection schemes, DelayHetSubmodular and DelayHetSampling. Built on a pairwise heterogeneity model with constants $B_{ij}$ and $\Gamma_{ij}$, the methods minimize the theoretical runtime to convergence by balancing per-round delays against convergence speed and error. DelayHetSubmodular optimizes a fixed subset using proxies to approximate the full gradient, while DelayHetSampling uses a sampling distribution with controlled bias, both providing convergence guarantees under bounded heterogeneity. Empirically, the schemes achieve up to 20x speedups over strong baselines across 9 datasets and multiple delay profiles, including NYCMesh, demonstrating practical impact for real-world FL deployments with heterogeneous data and networks.

Abstract

Federated learning (FL) is a distributed machine learning paradigm where multiple clients conduct local training based on their private data, then the updated models are sent to a central server for global aggregation. The practical convergence of FL is challenged by multiple factors, with the primary hurdle being the heterogeneity among clients. This heterogeneity manifests as data heterogeneity concerning local data distribution and latency heterogeneity during model transmission to the server. While prior research has introduced various efficient client selection methods to alleviate the negative impacts of either of these heterogeneities individually, efficient methods to handle real-world settings where both these heterogeneities exist simultaneously do not exist. In this paper, we propose two novel theoretically optimal client selection schemes that can handle both these heterogeneities. Our methods involve solving simple optimization problems every round obtained by minimizing the theoretical runtime to convergence. Empirical evaluations on 9 datasets with non-iid data distributions, 2 practical delay distributions, and non-convex neural network models demonstrate that our algorithms are at least competitive to and at most 20 times better than best existing baselines.

Client Selection in Federated Learning with Data Heterogeneity and Network Latencies

TL;DR

This work tackles the dual challenge of data heterogeneity and latency heterogeneity in federated learning by introducing two theoretically optimal client selection schemes, DelayHetSubmodular and DelayHetSampling. Built on a pairwise heterogeneity model with constants and , the methods minimize the theoretical runtime to convergence by balancing per-round delays against convergence speed and error. DelayHetSubmodular optimizes a fixed subset using proxies to approximate the full gradient, while DelayHetSampling uses a sampling distribution with controlled bias, both providing convergence guarantees under bounded heterogeneity. Empirically, the schemes achieve up to 20x speedups over strong baselines across 9 datasets and multiple delay profiles, including NYCMesh, demonstrating practical impact for real-world FL deployments with heterogeneous data and networks.

Abstract

Federated learning (FL) is a distributed machine learning paradigm where multiple clients conduct local training based on their private data, then the updated models are sent to a central server for global aggregation. The practical convergence of FL is challenged by multiple factors, with the primary hurdle being the heterogeneity among clients. This heterogeneity manifests as data heterogeneity concerning local data distribution and latency heterogeneity during model transmission to the server. While prior research has introduced various efficient client selection methods to alleviate the negative impacts of either of these heterogeneities individually, efficient methods to handle real-world settings where both these heterogeneities exist simultaneously do not exist. In this paper, we propose two novel theoretically optimal client selection schemes that can handle both these heterogeneities. Our methods involve solving simple optimization problems every round obtained by minimizing the theoretical runtime to convergence. Empirical evaluations on 9 datasets with non-iid data distributions, 2 practical delay distributions, and non-convex neural network models demonstrate that our algorithms are at least competitive to and at most 20 times better than best existing baselines.

Paper Structure

This paper contains 38 sections, 6 theorems, 9 equations, 3 figures, 3 tables, 3 algorithms.

Key Result

Lemma 1

For any set $S$, with coefficients defined by Definition def:coeff, $\nabla f_S$ is a biased gradient of $S$, such that $\left\lvert\left\lvert \nabla f_S(w) - \nabla f(w)\right\rvert\right\rvert^2 \leq \, B_S \left\lvert\left\lvert \nabla f(S)\right\rvert\right\rvert^2 + \Gamma_S, \,\, \forall w \

Figures (3)

  • Figure 1: Left: NYCMesh topology. Right: The round delay distribution of all edge devices.
  • Figure 2: Total wall-clock runtime of different client selection algorithms for Quadratic dataset under three distinct scenarios. (a) Performance of random client selection FLANP reisizadeh_straggler-resilient_2020, which handles delays, on pure heterogeneous data without network delays. (b) Performance of random and heterogeneity-aware client selection schemes, DivFL balakrishnan2022diverse and Power-of-Choice pmlr-v151-jee-cho22a, on iid data with NYCMesh latency distribution. (c) Performance of all baselines handling heterogeneous data with heterogeneous network latency distribution from NYCMesh. DelayHetSampling and DelayHetSubmodular are our proposed client selection methods.
  • Figure 3: Test accuracies and total runtimes for all baselines on different dataset and delay distribution settings.

Theorems & Definitions (10)

  • Definition 1: Coefficients
  • Lemma 1: Biased Gradient
  • Theorem 2
  • Remark 1: Runtime
  • Remark 2: Submodular
  • Lemma 3: Expected Delay
  • Lemma 4
  • Theorem 5
  • Remark 3: Runtime
  • Proposition 1: Pairwise Heterogeneity Linear Regression