Table of Contents
Fetching ...

Maverick-Aware Shapley Valuation for Client Selection in Federated Learning

Mengwei Yang, Ismat Jarin, Baturalp Buyukates, Salman Avestimehr, Athina Markopoulou

TL;DR

This work tackles data heterogeneity in Federated Learning by focusing on Mavericks—clients that exclusively own rare classes. It introduces a Maverick-aware Shapley valuation that computes class-wise contributions $\phi_i^c$ and accumulates them into $S_i^c$, forming a per-client score $\hat{S}_i=\sum_c \beta^c S_i^c$ where $\beta^c$ captures class difficulty. FedMS uses these scores to guide round-by-round client selection and to assign Shapley Rewards $R_i^t=\sum_c \beta^c \phi_i^c$, ensuring Mavericks are fairly valued and effectively utilized. Empirical results on MNIST and CIFAR-10 show that FedMS improves both model accuracy and fairness of rewards compared to multiple baselines, highlighting its practical impact for robust, fair multi-class FL in the presence of rare-data clients.

Abstract

Federated Learning (FL) allows clients to train a model collaboratively without sharing their private data. One key challenge in practical FL systems is data heterogeneity, particularly in handling clients with rare data, also referred to as Mavericks. These clients own one or more data classes exclusively, and the model performance becomes poor without their participation. Thus, utilizing Mavericks throughout training is crucial. In this paper, we first design a Maverick-aware Shapley valuation that fairly evaluates the contribution of Mavericks. The main idea is to compute the clients' Shapley values (SV) class-wise, i.e., per label. Next, we propose FedMS, a Maverick-Shapley client selection mechanism for FL that intelligently selects the clients that contribute the most in each round, by employing our Maverick-aware SV-based contribution score. We show that, compared to an extensive list of baselines, FedMS achieves better model performance and fairer Shapley Rewards distribution.

Maverick-Aware Shapley Valuation for Client Selection in Federated Learning

TL;DR

This work tackles data heterogeneity in Federated Learning by focusing on Mavericks—clients that exclusively own rare classes. It introduces a Maverick-aware Shapley valuation that computes class-wise contributions and accumulates them into , forming a per-client score where captures class difficulty. FedMS uses these scores to guide round-by-round client selection and to assign Shapley Rewards , ensuring Mavericks are fairly valued and effectively utilized. Empirical results on MNIST and CIFAR-10 show that FedMS improves both model accuracy and fairness of rewards compared to multiple baselines, highlighting its practical impact for robust, fair multi-class FL in the presence of rare-data clients.

Abstract

Federated Learning (FL) allows clients to train a model collaboratively without sharing their private data. One key challenge in practical FL systems is data heterogeneity, particularly in handling clients with rare data, also referred to as Mavericks. These clients own one or more data classes exclusively, and the model performance becomes poor without their participation. Thus, utilizing Mavericks throughout training is crucial. In this paper, we first design a Maverick-aware Shapley valuation that fairly evaluates the contribution of Mavericks. The main idea is to compute the clients' Shapley values (SV) class-wise, i.e., per label. Next, we propose FedMS, a Maverick-Shapley client selection mechanism for FL that intelligently selects the clients that contribute the most in each round, by employing our Maverick-aware SV-based contribution score. We show that, compared to an extensive list of baselines, FedMS achieves better model performance and fairer Shapley Rewards distribution.
Paper Structure (9 sections, 8 equations, 10 figures, 3 tables, 3 algorithms)

This paper contains 9 sections, 8 equations, 10 figures, 3 tables, 3 algorithms.

Figures (10)

  • Figure 1: Multiple devices participate in FL for a voice AI task. A few devices that exclusively own rare data, i.e., non-native accent data, are the Mavericks and crucial for training.
  • Figure 2: Comparison of test accuracy and Shapley rewards with 5 clients (w/o client selection) for the MNIST dataset using GTG-Shapley.
  • Figure 3: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the MNIST dataset using GTG-Shapley for various client selection techniques.
  • Figure 4: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the MNIST dataset using GTG-Shapley for various client selection techniques.
  • Figure 5: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using GTG-Shapley for various client selection techniques.
  • ...and 5 more figures