Table of Contents
Fetching ...

Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting

Georgios Tsoumplekas, Ilias Siniosoglou, Vasileios Argyriou, Ioannis D. Moscholios, Panagiotis Sarigiannidis

TL;DR

This paper tackles the dual challenges of class imbalance and privacy in medical data by integrating Balanced-MixUp, a data-regularization technique for imbalanced learning, with Federated Learning (FL) to predict cardiovascular disease outcomes from tabular data. The method augments local data with minority-focused synthetic samples and aggregates locally trained models across distributed nodes using Federated Averaging, preserving data privacy. Across four real-world CV datasets, Bal-MixUp in FL consistently improves the F-Score, and demonstrates robustness to hyperparameters and limited communication rounds, highlighting practical viability for resource-constrained, privacy-conscious medical deployments. The work underscores the practical impact of combining data-regularization with FL for accurate, privacy-preserving cardiovascular risk prediction on distributed, heterogeneous datasets.

Abstract

The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.

Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting

TL;DR

This paper tackles the dual challenges of class imbalance and privacy in medical data by integrating Balanced-MixUp, a data-regularization technique for imbalanced learning, with Federated Learning (FL) to predict cardiovascular disease outcomes from tabular data. The method augments local data with minority-focused synthetic samples and aggregates locally trained models across distributed nodes using Federated Averaging, preserving data privacy. Across four real-world CV datasets, Bal-MixUp in FL consistently improves the F-Score, and demonstrates robustness to hyperparameters and limited communication rounds, highlighting practical viability for resource-constrained, privacy-conscious medical deployments. The work underscores the practical impact of combining data-regularization with FL for accurate, privacy-preserving cardiovascular risk prediction on distributed, heterogeneous datasets.

Abstract

The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.
Paper Structure (19 sections, 2 equations, 4 figures, 2 tables)

This paper contains 19 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Model training process including employing Balanced-Mixup to deal with class-imbalance and training in a federated learning setting.
  • Figure 2: Two-dimensional t-SNE representations of Framingham's samples.
  • Figure 3: F-Score of examined methods for varying values of $\alpha$ in each dataset.
  • Figure 4: F-Score of examined methods for a varying number of communication rounds in each dataset.