Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data
Yukai Xu, Jingfeng Zhang, Yujie Gu
TL;DR
This work tackles privacy and heterogeneity in healthcare federated learning by introducing Abstention-Aware Federated Voting (AAFV), which enables collaborative training of heterogeneous local models without sharing parameters. AAFV uses a threshold-based abstention mechanism on perturbed predictions and consolidates high-confidence votes to create pseudo labels on a public unlabeled dataset, augmented by a local differential privacy (DP) layer via a piecewise mechanism. The framework is evaluated on diabetes prediction and MIMIC-III ICU mortality tasks, showing consistent accuracy gains over FedAvg and non-federated baselines at a fixed privacy budget of $\epsilon=1$ and demonstrating reduced communication cost. The results indicate that AAFV effectively balances data privacy, model confidentiality, and predictive utility in practical healthcare settings, offering a scalable approach for heterogeneous FL with sensitive data.
Abstract
In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively trained while ensuring their confidentiality to address intellectual property concerns. To tackle these challenges, we propose a new framework termed Abstention-Aware Federated Voting (AAFV) that can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy. This is achieved by integrating a novel abstention-aware voting mechanism and a differential privacy mechanism onto local models' predictions. In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models, which not only enhances the learning utility but also protects model confidentiality. Furthermore, we implement AAFV on two practical prediction tasks of diabetes and in-hospital patient mortality. The experiments demonstrate the effectiveness and confidentiality of AAFV in testing accuracy and privacy protection.
