Table of Contents
Fetching ...

Greedy Shapley Client Selection for Communication-Efficient Federated Learning

Pranava Singhal, Shashi Raj Pandey, Petar Popovski

TL;DR

This paper tackles efficient Federated Learning under strict communication budgets and client heterogeneity by introducing GreedyFed, a biased client selection method based on cumulative Shapley-Value (SV). It leverages a fast Monte Carlo SV approximation, GTG-Shapley, to make SV computation scalable to many clients, and adopts a two-stage selection: round-robin initialization followed by greedy selection of the top-$M$ contributors, with variants in SV averaging. The approach yields faster convergence with high accuracy and lower variance than baselines across multiple datasets, under data, system, and privacy heterogeneity and timing constraints. Practically, GreedyFed reduces communication rounds while maintaining model performance, offering a robust solution for real-world FL deployments with constrained communication opportunities.

Abstract

The standard client selection algorithms for Federated Learning (FL) are often unbiased and involve uniform random sampling of clients. This has been proven sub-optimal for fast convergence under practical settings characterized by significant heterogeneity in data distribution, computing, and communication resources across clients. For applications having timing constraints due to limited communication opportunities with the parameter server (PS), the client selection strategy is critical to complete model training within the fixed budget of communication rounds. To address this, we develop a biased client selection strategy, GreedyFed, that identifies and greedily selects the most contributing clients in each communication round. This method builds on a fast approximation algorithm for the Shapley Value at the PS, making the computation tractable for real-world applications with many clients. Compared to various client selection strategies on several real-world datasets, GreedyFed demonstrates fast and stable convergence with high accuracy under timing constraints and when imposing a higher degree of heterogeneity in data distribution, systems constraints, and privacy requirements.

Greedy Shapley Client Selection for Communication-Efficient Federated Learning

TL;DR

This paper tackles efficient Federated Learning under strict communication budgets and client heterogeneity by introducing GreedyFed, a biased client selection method based on cumulative Shapley-Value (SV). It leverages a fast Monte Carlo SV approximation, GTG-Shapley, to make SV computation scalable to many clients, and adopts a two-stage selection: round-robin initialization followed by greedy selection of the top- contributors, with variants in SV averaging. The approach yields faster convergence with high accuracy and lower variance than baselines across multiple datasets, under data, system, and privacy heterogeneity and timing constraints. Practically, GreedyFed reduces communication rounds while maintaining model performance, offering a robust solution for real-world FL deployments with constrained communication opportunities.

Abstract

The standard client selection algorithms for Federated Learning (FL) are often unbiased and involve uniform random sampling of clients. This has been proven sub-optimal for fast convergence under practical settings characterized by significant heterogeneity in data distribution, computing, and communication resources across clients. For applications having timing constraints due to limited communication opportunities with the parameter server (PS), the client selection strategy is critical to complete model training within the fixed budget of communication rounds. To address this, we develop a biased client selection strategy, GreedyFed, that identifies and greedily selects the most contributing clients in each communication round. This method builds on a fast approximation algorithm for the Shapley Value at the PS, making the computation tractable for real-world applications with many clients. Compared to various client selection strategies on several real-world datasets, GreedyFed demonstrates fast and stable convergence with high accuracy under timing constraints and when imposing a higher degree of heterogeneity in data distribution, systems constraints, and privacy requirements.
Paper Structure (10 sections, 2 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 10 sections, 2 equations, 3 figures, 4 tables, 2 algorithms.

Figures (3)

  • Figure 1: Test Accuracy versus Communication Rounds comparison on CIFAR10, data heterogeneity $\alpha = 10^{-4}$. GreedyFed surpasses all baselines with the fastest convergence and lowest standard deviation, achieving an accuracy close to centralized training.
  • Figure : Greedy Shapley-based Client Selection $\text{GreedyFed}(\{\mathcal{D}_k\},\, \mathcal{D}_{\textrm{val}},\,w^{(0)}, \, T)$
  • Figure : Server-Side Shapley Value Approximation $\text{GTG-Shapley}(w^{(t)},\,\{w_{k}^{(t+1)}\}_{k \in S_t},\, \mathcal{D}_{\textrm{val}})$