Table of Contents
Fetching ...

Fair Decentralized Learning

Sayan Biswas, Anne-Marie Kermarrec, Rishi Sharma, Thibaud Trinca, Martijn de Vos

TL;DR

This work tackles fairness in decentralized learning under feature heterogeneity by introducing Facade, a clustering-based approach where each node maintains a common core and multiple cluster-specific heads. Clusters emerge dynamically as nodes select the head that minimizes local loss, enabling per-cluster models that improve minority utility while collaborating for global performance. The authors prove convergence under standard assumptions, and extensive experiments on CIFAR-10, Imagenette, and Flickr-Mammals show that Facade achieves higher per-cluster accuracy and fairness than three strong baselines, while also reducing communication costs. Overall, Facade offers a scalable, decentralized solution that simultaneously enhances model utility and fairness across data-distribution clusters, with practical implications for healthcare and other high-stakes domains.

Abstract

Decentralized learning (DL) is an emerging approach that enables nodes to collaboratively train a machine learning model without sharing raw data. In many application domains, such as healthcare, this approach faces challenges due to the high level of heterogeneity in the training data's feature space. Such feature heterogeneity lowers model utility and negatively impacts fairness, particularly for nodes with under-represented training data. In this paper, we introduce \textsc{Facade}, a clustering-based DL algorithm specifically designed for fair model training when the training data exhibits several distinct features. The challenge of \textsc{Facade} is to assign nodes to clusters, one for each feature, based on the similarity in the features of their local data, without requiring individual nodes to know apriori which cluster they belong to. \textsc{Facade} (1) dynamically assigns nodes to their appropriate clusters over time, and (2) enables nodes to collaboratively train a specialized model for each cluster in a fully decentralized manner. We theoretically prove the convergence of \textsc{Facade}, implement our algorithm, and compare it against three state-of-the-art baselines. Our experimental results on three datasets demonstrate the superiority of our approach in terms of model accuracy and fairness compared to all three competitors. Compared to the best-performing baseline, \textsc{Facade} on the CIFAR-10 dataset also reduces communication costs by 32.3\% to reach a target accuracy when cluster sizes are imbalanced.

Fair Decentralized Learning

TL;DR

This work tackles fairness in decentralized learning under feature heterogeneity by introducing Facade, a clustering-based approach where each node maintains a common core and multiple cluster-specific heads. Clusters emerge dynamically as nodes select the head that minimizes local loss, enabling per-cluster models that improve minority utility while collaborating for global performance. The authors prove convergence under standard assumptions, and extensive experiments on CIFAR-10, Imagenette, and Flickr-Mammals show that Facade achieves higher per-cluster accuracy and fairness than three strong baselines, while also reducing communication costs. Overall, Facade offers a scalable, decentralized solution that simultaneously enhances model utility and fairness across data-distribution clusters, with practical implications for healthcare and other high-stakes domains.

Abstract

Decentralized learning (DL) is an emerging approach that enables nodes to collaboratively train a machine learning model without sharing raw data. In many application domains, such as healthcare, this approach faces challenges due to the high level of heterogeneity in the training data's feature space. Such feature heterogeneity lowers model utility and negatively impacts fairness, particularly for nodes with under-represented training data. In this paper, we introduce \textsc{Facade}, a clustering-based DL algorithm specifically designed for fair model training when the training data exhibits several distinct features. The challenge of \textsc{Facade} is to assign nodes to clusters, one for each feature, based on the similarity in the features of their local data, without requiring individual nodes to know apriori which cluster they belong to. \textsc{Facade} (1) dynamically assigns nodes to their appropriate clusters over time, and (2) enables nodes to collaboratively train a specialized model for each cluster in a fully decentralized manner. We theoretically prove the convergence of \textsc{Facade}, implement our algorithm, and compare it against three state-of-the-art baselines. Our experimental results on three datasets demonstrate the superiority of our approach in terms of model accuracy and fairness compared to all three competitors. Compared to the best-performing baseline, \textsc{Facade} on the CIFAR-10 dataset also reduces communication costs by 32.3\% to reach a target accuracy when cluster sizes are imbalanced.
Paper Structure (35 sections, 3 theorems, 13 equations, 21 figures, 4 tables)

This paper contains 35 sections, 3 theorems, 13 equations, 21 figures, 4 tables.

Key Result

Theorem 1

If assump:strong_convex_L_smoothassump:bounded_varassump:init hold, choosing learning rate $\eta=1/L$, for a fixed node $N_i$, each cluster $j\in [k]$, and any $\delta\in(0,1)$, in every round $t>0$, we have with probability at least $(1-\delta)$: where $\epsilon_0 \leq \frac{\nu}{\delta L \sqrt{pnB}}+\frac{\sigma^2}{\delta \alpha^2 \lambda^2 \Delta^4 B}+\frac{\sigma \nu k^{3/2}}{\delta^{3/2}\alp

Figures (21)

  • Figure 1: The test accuracy of a model trained with EL on CIFAR-10. Standard DL algorithms such as D-PSGD and EL results in significantly lower accuracy for the two nodes in the minority cluster compared to that of the nodes in the majority cluster. The error bars indicate the standard deviation of test accuracy.
  • Figure 2: The different operations during a training round in Facade, from the perspective of node $N_i$.
  • Figure 3: Average test accuracy for the nodes in the majority cluster (left) and those in the minority (right) obtained on CIFAR-10 ($\uparrow$ is better), for different cluster configurations.
  • Figure 4: Average test accuracy for the nodes in the majority cluster (left) and those in the minority (right) obtained on Imagenette ($\uparrow$ is better), for different cluster configurations.
  • Figure 5: Highest observed fair accuracy for CIFAR-10 (top) and Imagenette (bottom), for varying cluster configurations and algorithms ($\uparrow$ is better).
  • ...and 16 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Theorem 2
  • Corollary 3
  • Remark 1
  • Remark 2