Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI
Juexiao Zhou, Longxi Zhou, Di Wang, Xiaopeng Xu, Haoyang Li, Yuetan Chu, Wenkai Han, Xin Gao
TL;DR
PPPML-HMI tackles heterogeneous medical imaging data under federated learning by delivering personalized and privacy-preserving training without modifying model architectures. It integrates the $PerFedAvg$ personalization approach with a novel cyclic secure aggregation using homomorphic encryption ($CSAHE$) to enable decentralized secure gradient aggregation. On RAD-ChestCT classification and COVID-19 CT segmentation, PPPML-HMI demonstrates robustness across varying numbers of users and data sizes, achieving up to ~5% Dice improvement over standard FL in the real-world segmentation task, while resisting gradient-based privacy attacks such as iDLG. The method is open-source and plug-and-play, offering practical privacy and personalization for medical institutions with heterogeneous devices, though it incurs additional computational overhead compared to FL and has a vulnerability when the number of clients is exactly two.
Abstract
Heterogeneous data is endemic due to the use of diverse models and settings of devices by hospitals in the field of medical imaging. However, there are few open-source frameworks for federated heterogeneous medical image analysis with personalization and privacy protection simultaneously without the demand to modify the existing model structures or to share any private data. In this paper, we proposed PPPML-HMI, an open-source learning paradigm for personalized and privacy-preserving federated heterogeneous medical image analysis. To our best knowledge, personalization and privacy protection were achieved simultaneously for the first time under the federated scenario by integrating the PerFedAvg algorithm and designing our novel cyclic secure aggregation with the homomorphic encryption algorithm. To show the utility of PPPML-HMI, we applied it to a simulated classification task namely the classification of healthy people and patients from the RAD-ChestCT Dataset, and one real-world segmentation task namely the segmentation of lung infections from COVID-19 CT scans. For the real-world task, PPPML-HMI achieved $\sim$5\% higher Dice score on average compared to conventional FL under the heterogeneous scenario. Meanwhile, we applied the improved deep leakage from gradients to simulate adversarial attacks and showed the solid privacy-preserving capability of PPPML-HMI. By applying PPPML-HMI to both tasks with different neural networks, a varied number of users, and sample sizes, we further demonstrated the strong robustness of PPPML-HMI.
