Privacy-preserving machine learning for healthcare: open challenges and future perspectives
Alejandro Guerra-Manzanares, L. Julian Lechuga Lopez, Michail Maniatakos, Farah E. Shamout
TL;DR
This paper surveys privacy-preserving machine learning (PPML) in healthcare, differentiating PPML for training from PPML for inference and examining core techniques such as federated learning, differential privacy, homomorphic encryption, and secure multi-party computation. It highlights that current work is biased toward single-modality datasets, internal validation, and a focus on CNNs, with substantial computational and engineering trade-offs limiting generalization and real-world adoption. The authors identify open challenges—privacy-accuracy trade-offs, resource demands, and centralization risks—alongside opportunities in MLaaS, multi-modal learning, and integration of state-of-the-art architectures. They advocate for broader, multi-institutional evaluations, diverse data modalities, and explainability considerations to move toward trustworthy, private ML in clinical practice.
Abstract
Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.
