Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

Congyu Fang; Adam Dziedzic; Lin Zhang; Laura Oliva; Amol Verma; Fahad Razak; Nicolas Papernot; Bo Wang

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

Congyu Fang, Adam Dziedzic, Lin Zhang, Laura Oliva, Amol Verma, Fahad Razak, Nicolas Papernot, Bo Wang

TL;DR

It is demonstrated that the ML models trained with the DeCaPH framework have an improved utility-privacy trade-off, showing DeCaPH enables the models to have good performance while preserving the privacy of the training data points.

Abstract

Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). It offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets; (2) it safeguards patient privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized server. We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing it enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

TL;DR

Abstract

Paper Structure (50 sections, 13 equations, 5 figures, 2 algorithms)

This paper contains 50 sections, 13 equations, 5 figures, 2 algorithms.

Abstract
Introduction
Methods
Results
Conclusion and Discussion
Contributors
Data sharing
Declaration of interests
Acknowledgements
Declaration of generative AI and AI-assisted technologies in the writing process
Figure Legends

Figures (5)

Figure 1: An overview of DeCaPH learning framework. (a), flowchart of the steps for one iteration of training with DeCaPH. At each communication round, a leader is first selected to perform the aggregation of the participants' model weights; each hospital locally randomly samples a mini-batch of data points and computes their point-wise gradients; each hospital locally clips the point-wise gradient vectors and adds a calibrated Gaussian Noise; all participating hospitals send their local gradients to the leader; the leader aggregates the gradients from all hospitals using SecAgg and outputs an updated model that is differentially private; all participating hospitals synchronize their model state with the leader. Reiterate these steps until convergence. (b), visualization of one training iteration of DeCaPH with three participating hospitals.
Figure 2: DeCaPH to predict mortality using EHR. (a), the number of health records available at each participating hospital ($P_1, P_2, ..., P_8$). (b), "alive" vs. "death" cases at each hospital. (c), the performance of models trained using the private datasets at each silo and models trained with all datasets using FL, PriMIA, and our DeCaPH (highlighted in purple). The experiments are repeated with 5-fold cross-validation. The figures show the first quartile, median, and third quartile, as well as the outliers ($1.5 \times$ interquartile range below or above the lower and upper quartile.) We perform a Wilcoxon signed-rank test (one-tail) with continuity correction using exact method to compare the performance of models trained with DeCaPH to those trained with PriMIA for each of the evaluation metrics. The alternative hypothesis is that models trained with DeCaPH have higher scores. The p-values are $< 0.05$ for all metrics except for NPV.
Figure 3: DeCaPH to classify cell types using single-cell human pancreas dataset. (a), the number of data points available in each participating study, ($P_1, P_2, ..., P_5$). (b), the proportion of the classes in the datasets. (c), the performance (with 5-fold cross-validation) of the models trained using the private dataset of each study and the models trained with all datasets using FL, PriMIA, and DeCaPH (highlighted in purple). We break the axis for better visualization. The figures show the first quartile, median, and third quartile, as well as the outliers ($1.5 \times$ interquartile range below or above the lower and upper quartile.) We perform a Wilcoxon signed-rank test (one-tail) with continuity correction using exact method on performance of models trained with DeCaPH and PriMIA for each of the evaluation metrics. The alternative hypothesis is that models trained with DeCaPH have higher scores for that metric. The p-values are $< 0.05$ for all metrics.
Figure 4: DeCaPH to identify pathologies from human chest radiology images (a), the sizes of the datasets available in each study, ($P_1, P_2, P_3$). (b), the class distribution of the datasets. (c), the performance on AUROC for the four output labels (with 5-fold cross-validation) of the models trained using the private dataset of each study and the models trained with all datasets using FL, PriMIA, and DeCaPH (highlighted in purple). The figures show the first quartile, median, and third quartile, as well as the outliers ($1.5 \times$ interquartile range below or above the lower and upper quartile.) We perform a Wilcoxon signed-rank test (one-tail) with continuity correction using exact method on performance of models trained with DeCaPH and PriMIA for each of the pathologies and "No Finding". The alternative hypothesis is that models trained with DeCaPH have higher AUROC scores. The p-values are $< 0.05$ for all three pathologies and "No Finding".
Figure 5: Models trained with DeCaPH are more robust to Membership Inference Attacks. We perform Membership Inference Attack on models trained with DeCaPH vs. FL for the three case studies. The models trained with DeCaPH (Ours) are differentially private. The models trained with FL are not privacy-preserving. The target models are trained five times to plot the $95\%$ confidence interval. (a), for GEMINI, the AUROC for FL is $0.620\pm0.043$ and that for DeCaPH is $0.521 \pm 0.003$. (b), for single-cell human pancreas, the AUROC for FL is $0.584\pm0.009$ and that for DeCaPH is $0.522 \pm 0.004$. (c), for chest radiology, the AUROC for FL is $0.537\pm0.001$ and that for DeCaPH is $0.500 \pm 0.001$; mean $\pm$ SD.

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

TL;DR

Abstract

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

Authors

TL;DR

Abstract

Table of Contents

Figures (5)