Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data
Vikhyat Agrawal, Sunil Vasu Kalmady, Venkataseetharam Manoj Malipeddi, Manisimha Varma Manthena, Weijie Sun, Saiful Islam, Abram Hindle, Padma Kaul, Russell Greiner
TL;DR
This work addresses privacy-aware, population-scale ECG classification across seven hospitals by implementing Federated Learning (FL) with and without Differential Privacy (DP). Using a CNN-based multi-label model trained on 1.56 million ECGs, the study shows FL achieves performance near that of pooled training, with notable benefits for hospitals with smaller datasets in inter-site evaluation. Incorporating DP via DP-SGD introduces a tangible privacy-utility trade-off: stronger privacy (smaller $\epsilon$) degrades AUROC, while looser privacy improves performance but reduces confidentiality guarantees. The findings support privacy-preserving cross-institution ECG analytics, offering practical implications for regulatory-compliant, large-scale cardiovascular risk stratification and disease labeling.
Abstract
This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.
