Table of Contents
Fetching ...

FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data

Yukun Zhang, Guanzhong Chen, Zenglin Xu, Jianyong Wang, Dun Zeng, Junfan Li, Jinghua Wang, Yuan Qi, Irwin King

TL;DR

This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD, based on naturally scattered datasets constructed from the CVD data of seven institutions, and reveals that FL faces new challenges with real-world non-IID and long-tail data.

Abstract

Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment. Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality. However, the sensitive nature of healthcare data often restricts individual clinical institutions from sharing data to train sufficiently generalized and unbiased ML models. Federated Learning (FL) is an emerging approach, which offers a promising solution by enabling collaborative model training across multiple participants without compromising the privacy of the individual data owners. However, to the best of our knowledge, there has been limited prior research applying FL to the cardiovascular disease domain. Moreover, existing FL benchmarks and datasets are typically simulated and may fall short of replicating the complexity of natural heterogeneity found in realistic datasets that challenges current FL algorithms. To address these gaps, this paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD. This benchmark comprises two major tasks: electrocardiogram (ECG) classification and echocardiogram (ECHO) segmentation, based on naturally scattered datasets constructed from the CVD data of seven institutions. Our extensive experiments on these datasets reveal that FL faces new challenges with real-world non-IID and long-tail data. The code and datasets of FedCVD are available https://github.com/SMILELab-FL/FedCVD.

FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data

TL;DR

This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD, based on naturally scattered datasets constructed from the CVD data of seven institutions, and reveals that FL faces new challenges with real-world non-IID and long-tail data.

Abstract

Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment. Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality. However, the sensitive nature of healthcare data often restricts individual clinical institutions from sharing data to train sufficiently generalized and unbiased ML models. Federated Learning (FL) is an emerging approach, which offers a promising solution by enabling collaborative model training across multiple participants without compromising the privacy of the individual data owners. However, to the best of our knowledge, there has been limited prior research applying FL to the cardiovascular disease domain. Moreover, existing FL benchmarks and datasets are typically simulated and may fall short of replicating the complexity of natural heterogeneity found in realistic datasets that challenges current FL algorithms. To address these gaps, this paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD. This benchmark comprises two major tasks: electrocardiogram (ECG) classification and echocardiogram (ECHO) segmentation, based on naturally scattered datasets constructed from the CVD data of seven institutions. Our extensive experiments on these datasets reveal that FL faces new challenges with real-world non-IID and long-tail data. The code and datasets of FedCVD are available https://github.com/SMILELab-FL/FedCVD.

Paper Structure

This paper contains 57 sections, 5 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: The overall architecture of the proposed FedCVD benchmark. We present two main settings (Fed-ECG, Fed-ECHO) and an experimental platform, highlighting three primary challenges. Green and blue circles in the challenges section indicate their presence in Fed-ECG and Fed-ECHO, respectively. The API section highlights user-facing APIs in orange boxes.
  • Figure 2: Demonstration of the nature of Fed-ECG Dataset.
  • Figure 3: Demonstration of Fed-ECG’s challenge: Comparisons of performance (relative Mean Average Score %) between artificial partitions (simulated random and partitions) and Fed-ECG’s natural partition across different algorithms.
  • Figure 4: Demonstration of Fed-ECG's long-tail challenge: Average Macro F1-Score (%) and Standard Deviation across classes for various FL Algorithms.
  • Figure 5: Label non-IID of the Fed-ECG dataset with the artificially non-IID partition, shown as the variation in the number of each label (right axis) across different clients (left axis).
  • ...and 2 more figures