Table of Contents
Fetching ...

Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI

Juexiao Zhou, Longxi Zhou, Di Wang, Xiaopeng Xu, Haoyang Li, Yuetan Chu, Wenkai Han, Xin Gao

TL;DR

PPPML-HMI tackles heterogeneous medical imaging data under federated learning by delivering personalized and privacy-preserving training without modifying model architectures. It integrates the $PerFedAvg$ personalization approach with a novel cyclic secure aggregation using homomorphic encryption ($CSAHE$) to enable decentralized secure gradient aggregation. On RAD-ChestCT classification and COVID-19 CT segmentation, PPPML-HMI demonstrates robustness across varying numbers of users and data sizes, achieving up to ~5% Dice improvement over standard FL in the real-world segmentation task, while resisting gradient-based privacy attacks such as iDLG. The method is open-source and plug-and-play, offering practical privacy and personalization for medical institutions with heterogeneous devices, though it incurs additional computational overhead compared to FL and has a vulnerability when the number of clients is exactly two.

Abstract

Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we proposed PPPML-HMI, an open-source learning paradigm for personalized and privacy-preserving federated heterogeneous medical image analysis. To our best knowledge, personalization and privacy protection were achieved simultaneously for the first time under the federated scenario by integrating the PerFedAvg algorithm and designing our novel cyclic secure aggregation with the homomorphic encryption algorithm. To show the utility of PPPML-HMI, we applied it to a simulated classification task namely the classification of healthy people and patients from the RAD-ChestCT Dataset, and one real-world segmentation task namely the segmentation of lung infections from COVID-19 CT scans. For the real-world task, PPPML-HMI achieved $\sim$5\% higher Dice score on average compared to conventional FL under the heterogeneous scenario. Meanwhile, we applied the improved deep leakage from gradients to simulate adversarial attacks and showed the solid privacy-preserving capability of PPPML-HMI. By applying PPPML-HMI to both tasks with different neural networks, a varied number of users, and sample sizes, we further demonstrated the strong robustness of PPPML-HMI.

Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI

TL;DR

PPPML-HMI tackles heterogeneous medical imaging data under federated learning by delivering personalized and privacy-preserving training without modifying model architectures. It integrates the personalization approach with a novel cyclic secure aggregation using homomorphic encryption () to enable decentralized secure gradient aggregation. On RAD-ChestCT classification and COVID-19 CT segmentation, PPPML-HMI demonstrates robustness across varying numbers of users and data sizes, achieving up to ~5% Dice improvement over standard FL in the real-world segmentation task, while resisting gradient-based privacy attacks such as iDLG. The method is open-source and plug-and-play, offering practical privacy and personalization for medical institutions with heterogeneous devices, though it incurs additional computational overhead compared to FL and has a vulnerability when the number of clients is exactly two.

Abstract

Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we proposed PPPML-HMI, an open-source learning paradigm for personalized and privacy-preserving federated heterogeneous medical image analysis. To our best knowledge, personalization and privacy protection were achieved simultaneously for the first time under the federated scenario by integrating the PerFedAvg algorithm and designing our novel cyclic secure aggregation with the homomorphic encryption algorithm. To show the utility of PPPML-HMI, we applied it to a simulated classification task namely the classification of healthy people and patients from the RAD-ChestCT Dataset, and one real-world segmentation task namely the segmentation of lung infections from COVID-19 CT scans. For the real-world task, PPPML-HMI achieved 5\% higher Dice score on average compared to conventional FL under the heterogeneous scenario. Meanwhile, we applied the improved deep leakage from gradients to simulate adversarial attacks and showed the solid privacy-preserving capability of PPPML-HMI. By applying PPPML-HMI to both tasks with different neural networks, a varied number of users, and sample sizes, we further demonstrated the strong robustness of PPPML-HMI.
Paper Structure (15 sections, 4 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Scheme of PPPML-HMI. In our real-world case, hospitals use devices from different manufacturers with various models and settings for the detection of lung infection by COVID-19. The use of diverse devices generates data with inherent differences, namely heterogeneous data. With FL, the goal is to jointly train a consensus model with the data from each hospital without sharing the data itself. With homogeneous data across hospitals, FL could efficiently train a server model that works well for all hospitals. However, when hospitals have heterogeneous data, the server model trained by FL could not perform well when applied to each hospital. Thus, PPPML-HMI allows models to adapt to heterogeneous data. To strengthen the privacy protection of PPPML-HMI, we designed the cyclic secure aggregation with homomorphic encryption.
  • Figure 2: A) Illustration of two tasks. We applied PPPML-HMI to the classification of healthy people and patients with 3D DenseNet on the RAD-ChestCT Dataset, and the segmentation of the lung infections of COVID-19 with a 2.5D U-Net method zhou2020rapidzhou2022interpretable. Illustration of the communication network and attackers of FL (B) and PPPML-HMI (C). Two types of attackers exist in our setting: 1) Attackers who can intercept messages sent from any users to the server or between users (type I), and 2) Honest-but-curious attackers who are part of the users of PPPML-HMI (type II).
  • Figure 3: A) Dimension reduction and clustering with UMAP according to the manufacturer indicated that CT scans generated by different CT scanners had significant inherent differences. B) Heatmap showed the Dice score of segmentation when applying models trained centrally on the data of each hospital. C) Barplot showed the Dice score of models trained centrally, with federated learning, and with PPPML-HMI.
  • Figure 4: A) Visualization of predicted segmentation mask on the high-quality sample A000069 and low-quality sample A000075 (Orange: true positives, Green: false positives, Yellow: false negatives). B) Visualization of the dummy data from the iDLG attack on FL and PPPML-HMI against both types of attackers.