Fair Decentralized Learning
Sayan Biswas, Anne-Marie Kermarrec, Rishi Sharma, Thibaud Trinca, Martijn de Vos
TL;DR
This work tackles fairness in decentralized learning under feature heterogeneity by introducing Facade, a clustering-based approach where each node maintains a common core and multiple cluster-specific heads. Clusters emerge dynamically as nodes select the head that minimizes local loss, enabling per-cluster models that improve minority utility while collaborating for global performance. The authors prove convergence under standard assumptions, and extensive experiments on CIFAR-10, Imagenette, and Flickr-Mammals show that Facade achieves higher per-cluster accuracy and fairness than three strong baselines, while also reducing communication costs. Overall, Facade offers a scalable, decentralized solution that simultaneously enhances model utility and fairness across data-distribution clusters, with practical implications for healthcare and other high-stakes domains.
Abstract
Decentralized learning (DL) is an emerging approach that enables nodes to collaboratively train a machine learning model without sharing raw data. In many application domains, such as healthcare, this approach faces challenges due to the high level of heterogeneity in the training data's feature space. Such feature heterogeneity lowers model utility and negatively impacts fairness, particularly for nodes with under-represented training data. In this paper, we introduce \textsc{Facade}, a clustering-based DL algorithm specifically designed for fair model training when the training data exhibits several distinct features. The challenge of \textsc{Facade} is to assign nodes to clusters, one for each feature, based on the similarity in the features of their local data, without requiring individual nodes to know apriori which cluster they belong to. \textsc{Facade} (1) dynamically assigns nodes to their appropriate clusters over time, and (2) enables nodes to collaboratively train a specialized model for each cluster in a fully decentralized manner. We theoretically prove the convergence of \textsc{Facade}, implement our algorithm, and compare it against three state-of-the-art baselines. Our experimental results on three datasets demonstrate the superiority of our approach in terms of model accuracy and fairness compared to all three competitors. Compared to the best-performing baseline, \textsc{Facade} on the CIFAR-10 dataset also reduces communication costs by 32.3\% to reach a target accuracy when cluster sizes are imbalanced.
