Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

Vikhyat Agrawal; Sunil Vasu Kalmady; Venkataseetharam Manoj Malipeddi; Manisimha Varma Manthena; Weijie Sun; Saiful Islam; Abram Hindle; Padma Kaul; Russell Greiner

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

Vikhyat Agrawal, Sunil Vasu Kalmady, Venkataseetharam Manoj Malipeddi, Manisimha Varma Manthena, Weijie Sun, Saiful Islam, Abram Hindle, Padma Kaul, Russell Greiner

TL;DR

This work addresses privacy-aware, population-scale ECG classification across seven hospitals by implementing Federated Learning (FL) with and without Differential Privacy (DP). Using a CNN-based multi-label model trained on 1.56 million ECGs, the study shows FL achieves performance near that of pooled training, with notable benefits for hospitals with smaller datasets in inter-site evaluation. Incorporating DP via DP-SGD introduces a tangible privacy-utility trade-off: stronger privacy (smaller $\epsilon$) degrades AUROC, while looser privacy improves performance but reduces confidentiality guarantees. The findings support privacy-preserving cross-institution ECG analytics, offering practical implications for regulatory-compliant, large-scale cardiovascular risk stratification and disease labeling.

Abstract

This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

TL;DR

) degrades AUROC, while looser privacy improves performance but reduces confidentiality guarantees. The findings support privacy-preserving cross-institution ECG analytics, offering practical implications for regulatory-compliant, large-scale cardiovascular risk stratification and disease labeling.

Abstract

Paper Structure (19 sections, 8 figures, 5 tables)

This paper contains 19 sections, 8 figures, 5 tables.

Introduction
Federated Learning
Differential Privacy
Aim of Study
Related Work
Methods
Data Description and Patient Characteristics
Prediction Task
Learning Algorithm
FL/DP Algorithm
Federated Averaging (FedAvg)
Differentially Private Stochastic Gradient Descent (DP-SGD)
Evaluation
Results
Characteristics of Multi-hospital Cohorts
...and 4 more sections

Figures (8)

Figure 1: Diagram illustrating the Federated Learning setup for multi-hospital ECG datasets
Figure 2: Diagram illustrating Differential Privacy
Figure 3: Diagram depicting the prediction task by representing the output using a specific example
Figure 4: Comparison of Model Performance (AUROC in %) between the Federated Learning (FL) Approach and Standard Approach for different sites. The error bars on the graph indicate the 95% confidence intervals.
Figure 5: Diagram explaining the intra-site and inter-site testing framework. For intra-site testing, the classification model is trained and tested on Hospital A's respective train and test sets. In contrast, for inter-site testing, the model is trained on Hospital A's training data and tested on Hospital B's testing data
...and 3 more figures

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

TL;DR

Abstract

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)