Table of Contents
Fetching ...

Fair Distributed Machine Learning with Imbalanced Data as a Stackelberg Evolutionary Game

Sebastian Niehaus, Ingo Roeder, Nico Scherf

TL;DR

This work tackles fairness in decentralized learning under data imbalance by modeling the system as a Stackelberg evolutionary game. It introduces two weighting schemes, the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM), where node contributions to the global model are strategically chosen to address node-specific losses. Experiments on three Dirichlet-balanced medical datasets (BloodMNIST, DermaMNIST, BreastMNIST) show that ASWM substantially benefits underrepresented nodes, delivering notable AUC gains while imposing only small penalties on nodes with larger datasets. The study highlights the potential of game-theoretic approaches to improve fairness in distributed learning, while also acknowledging the sequential nature of Stackelberg updates as a limitation and a direction for future integration with parallel, scalable federated frameworks.

Abstract

Decentralised learning enables the training of deep learning algorithms without centralising data sets, resulting in benefits such as improved data privacy, operational efficiency and the fostering of data ownership policies. However, significant data imbalances pose a challenge in this framework. Participants with smaller datasets in distributed learning environments often achieve poorer results than participants with larger datasets. Data imbalances are particularly pronounced in medical fields and are caused by different patient populations, technological inequalities and divergent data collection practices. In this paper, we consider distributed learning as an Stackelberg evolutionary game. We present two algorithms for setting the weights of each node's contribution to the global model in each training round: the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM). We use three medical datasets to highlight the impact of dynamic weighting on underrepresented nodes in distributed learning. Our results show that the ASWM significantly favours underrepresented nodes by improving their performance by 2.713% in AUC. Meanwhile, nodes with larger datasets experience only a modest average performance decrease of 0.441%.

Fair Distributed Machine Learning with Imbalanced Data as a Stackelberg Evolutionary Game

TL;DR

This work tackles fairness in decentralized learning under data imbalance by modeling the system as a Stackelberg evolutionary game. It introduces two weighting schemes, the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM), where node contributions to the global model are strategically chosen to address node-specific losses. Experiments on three Dirichlet-balanced medical datasets (BloodMNIST, DermaMNIST, BreastMNIST) show that ASWM substantially benefits underrepresented nodes, delivering notable AUC gains while imposing only small penalties on nodes with larger datasets. The study highlights the potential of game-theoretic approaches to improve fairness in distributed learning, while also acknowledging the sequential nature of Stackelberg updates as a limitation and a direction for future integration with parallel, scalable federated frameworks.

Abstract

Decentralised learning enables the training of deep learning algorithms without centralising data sets, resulting in benefits such as improved data privacy, operational efficiency and the fostering of data ownership policies. However, significant data imbalances pose a challenge in this framework. Participants with smaller datasets in distributed learning environments often achieve poorer results than participants with larger datasets. Data imbalances are particularly pronounced in medical fields and are caused by different patient populations, technological inequalities and divergent data collection practices. In this paper, we consider distributed learning as an Stackelberg evolutionary game. We present two algorithms for setting the weights of each node's contribution to the global model in each training round: the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM). We use three medical datasets to highlight the impact of dynamic weighting on underrepresented nodes in distributed learning. Our results show that the ASWM significantly favours underrepresented nodes by improving their performance by 2.713% in AUC. Meanwhile, nodes with larger datasets experience only a modest average performance decrease of 0.441%.

Paper Structure

This paper contains 8 sections, 15 equations, 1 figure, 4 tables, 1 algorithm.

Figures (1)

  • Figure 1: This figure provides a detailed overview of the estimation of contribution weights for model updates and the data distribution across nodes, with the leader represented in light green, the first follower in dark blue, and the second follower in light blue. Panel (a) depicts the leader facilitating a global update. Panel (b) illustrates the class-wise distribution across all datasets and nodes. Panels (c) and (d) present the selection of contribution weights for the respective nodes using the ASWM method in panel (c) and the DSWM method in panel (d). The x-axis is showing the rounds of global model updates.