Table of Contents
Fetching ...

Towards Fairness in Provably Communication-Efficient Federated Recommender Systems

Kirandeep Kaur, Sujit Gujar, Shweta Jain

TL;DR

This work tackles the dual challenges of communication efficiency and fairness in federated recommender systems (FRSs) that use matrix factorization. It introduces RS-FairFRS, a framework that combines random client sampling with a two-phase dual-fair updating scheme (server-side FairMF and local FO-ClientBatch) to preserve accuracy while reducing communication and mitigating demographic bias. The authors derive sample-complexity bounds showing that sampling a fraction around 0.35 of clients suffices to maintain performance, and empirically demonstrate substantial communication cost reductions (~47%) and bias reductions (~40%) on ML1M and ML100K with limited server data. The results suggest a practical path toward fair, communication-efficient FRSs that do not require sharing sensitive attributes, with implications for real-world deployments and future fairness-accuracy analyses.

Abstract

To reduce the communication overhead caused by parallel training of multiple clients, various federated learning (FL) techniques use random client sampling. Nonetheless, ensuring the efficacy of random sampling and determining the optimal number of clients to sample in federated recommender systems (FRSs) remains challenging due to the isolated nature of each user as a separate client. This challenge is exacerbated in models where public and private features can be separated, and FL allows communication of only public features (item gradients). In this study, we establish sample complexity bounds that dictate the ideal number of clients required for improved communication efficiency and retained accuracy in such models. In line with our theoretical findings, we empirically demonstrate that RS-FairFRS reduces communication cost (~47%). Second, we demonstrate the presence of class imbalance among clients that raises a substantial equity concern for FRSs. Unlike centralized machine learning, clients in FRS can not share raw data, including sensitive attributes. For this, we introduce RS-FairFRS, first fairness under unawareness FRS built upon random sampling based FRS. While random sampling improves communication efficiency, we propose a novel two-phase dual-fair update technique to achieve fairness without revealing protected attributes of active clients participating in training. Our results on real-world datasets and different sensitive features illustrate a significant reduction in demographic bias (~approx40\%), offering a promising path to achieving fairness and communication efficiency in FRSs without compromising the overall accuracy of FRS.

Towards Fairness in Provably Communication-Efficient Federated Recommender Systems

TL;DR

This work tackles the dual challenges of communication efficiency and fairness in federated recommender systems (FRSs) that use matrix factorization. It introduces RS-FairFRS, a framework that combines random client sampling with a two-phase dual-fair updating scheme (server-side FairMF and local FO-ClientBatch) to preserve accuracy while reducing communication and mitigating demographic bias. The authors derive sample-complexity bounds showing that sampling a fraction around 0.35 of clients suffices to maintain performance, and empirically demonstrate substantial communication cost reductions (~47%) and bias reductions (~40%) on ML1M and ML100K with limited server data. The results suggest a practical path toward fair, communication-efficient FRSs that do not require sharing sensitive attributes, with implications for real-world deployments and future fairness-accuracy analyses.

Abstract

To reduce the communication overhead caused by parallel training of multiple clients, various federated learning (FL) techniques use random client sampling. Nonetheless, ensuring the efficacy of random sampling and determining the optimal number of clients to sample in federated recommender systems (FRSs) remains challenging due to the isolated nature of each user as a separate client. This challenge is exacerbated in models where public and private features can be separated, and FL allows communication of only public features (item gradients). In this study, we establish sample complexity bounds that dictate the ideal number of clients required for improved communication efficiency and retained accuracy in such models. In line with our theoretical findings, we empirically demonstrate that RS-FairFRS reduces communication cost (~47%). Second, we demonstrate the presence of class imbalance among clients that raises a substantial equity concern for FRSs. Unlike centralized machine learning, clients in FRS can not share raw data, including sensitive attributes. For this, we introduce RS-FairFRS, first fairness under unawareness FRS built upon random sampling based FRS. While random sampling improves communication efficiency, we propose a novel two-phase dual-fair update technique to achieve fairness without revealing protected attributes of active clients participating in training. Our results on real-world datasets and different sensitive features illustrate a significant reduction in demographic bias (~approx40\%), offering a promising path to achieving fairness and communication efficiency in FRSs without compromising the overall accuracy of FRS.
Paper Structure (23 sections, 3 theorems, 17 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 3 theorems, 17 equations, 8 figures, 1 table, 3 algorithms.

Key Result

lemma 1

Suppose $n$ clients are uniformly distributed amongst $K$ clusters. Then, a subset $C^\tau \subseteq [n]$ sampled uniformly at random (without replacement) will contain an approximately equal number of clients from each cluster, i.e., is very low.

Figures (8)

  • Figure 1: Ideal number of clusters in both datasets.
  • Figure 2: Experimental analysis of random sampling of clients and clusters.
  • Figure 3: TheRS-FairFRS Framework.Every user trains a local recommendation model and generates user and item vectors. A fraction of clients $C^{\tau}$ are randomly sampled, and their item gradients are communicated to the server. The server aggregates these gradients for each item to generate global model $V$ and trains them towards fairness using FairMF to generate $U_{fair}$ and $V_{fair}$. Then, $V$ and $V_{fair}$ are communicated to every client. Each client trains $V$ to minimize the local loss and $V_{fair}$ to reduce the difference between the local and the fair global model.
  • Figure 4: Comparison plots for accuracy and average time per communication round on different values of $\tau$ in RS-FedRec on two datasets, ML1M and ML100k.
  • Figure 5: Difference in FedRec RMSE scores for different sensitive attributes over two datasets.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 3.2.1
  • Definition 3.2.2
  • lemma 1
  • lemma 2
  • theorem 1: Random Sampling of Clients