Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications

Marziyeh Mohammadi; Mohsen Vejdanihemmat; Mahshad Lotfinia; Mirabela Rusu; Daniel Truhn; Andreas Maier; Soroosh Tayebi Arasteh

Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications

Marziyeh Mohammadi, Mohsen Vejdanihemmat, Mahshad Lotfinia, Mirabela Rusu, Daniel Truhn, Andreas Maier, Soroosh Tayebi Arasteh

TL;DR

This scoping review synthesizes applications of DP in medical DL across centralized and federated settings and identifies key gaps in fairness auditing and standardization, and outlines priorities for equitable, clinically robust privacy-preserving DL.

Abstract

Differential privacy (DP) is a key technique for protecting sensitive patient data in medical deep learning (DL). As clinical models grow more data-dependent, balancing privacy with utility and fairness has become a critical challenge. This scoping review synthesizes recent developments in applying DP to medical DL, with a particular focus on DP-SGD and alternative mechanisms across centralized and federated settings. Using a structured search strategy, we identified 74 studies published up to March 2025. Our analysis spans diverse data modalities, training setups, and downstream tasks, and highlights the tradeoffs between privacy guarantees, model accuracy, and subgroup fairness. We find that while DP-especially at strong privacy budgets-can preserve performance in well-structured imaging tasks, severe degradation often occurs under strict privacy, particularly in underrepresented or complex modalities. Furthermore, privacy-induced performance gaps disproportionately affect demographic subgroups, with fairness impacts varying by data type and task. A small subset of studies explicitly addresses these tradeoffs through subgroup analysis or fairness metrics, but most omit them entirely. Beyond DP-SGD, emerging approaches leverage alternative mechanisms, generative models, and hybrid federated designs, though reporting remains inconsistent. We conclude by outlining key gaps in fairness auditing, standardization, and evaluation protocols, offering guidance for future work toward equitable and clinically robust privacy-preserving DL systems in medicine.

Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications

TL;DR

Abstract

Paper Structure

This paper contains 1 section, 23 equations, 4 figures, 7 tables.

Supplementary information

Figures (4)

Figure 1: Overview of differential privacy (DP) applied to model training and inference. (a) Training samples are processed individually to compute per-sample gradients, which are clipped to limit sensitivity and then perturbed by noise sampled from a Gaussian distribution with variance $\sigma^2$. Higher noise (larger $\sigma$) yields stronger privacy (smaller $\epsilon$), and the resulting noisy gradients are aggregated to update model parameters. (b) DP bounds how much the model’s output distribution can change when a single individual’s data is added or removed from the training set. When noise is low (larger $\epsilon$), outputs for neighboring datasets may differ noticeably, making membership inference easier. With stronger privacy (smaller $\epsilon$), output distributions overlap and individual influence becomes indistinguishable. The parameter $\delta$ represents the probability that the DP guarantee may not hold. (c) In the absence of DP, an attacker with access to a deployed model may exploit output differences to infer whether a specific sample was included in training, potentially revealing sensitive patient information (e.g., metadata or identifiers). Representative chest X-ray images are provided by the ChestX-ray14 dataset from NIH Clinical Center wang2017chestx.
Figure 2: Preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR) flow diagram illustrating the selection process for this study. The diagram tricco2018prismapage2021prisma details the number of records identified through database and manual searches, duplicates removed, records screened by title and abstract, full-text articles assessed for eligibility, and the final number of studies included.
Figure 3: Overview of study characteristics across included papers. (a) Geographic distribution of included studies based on author affiliations. Color intensity reflects the number of publications with at least one affiliated author per country; all affiliations of all the coauthors of each study are considered in this statistic. The top 4 contributing countries include the USA, China, the UK, and Germany, with 29, 24, 13, and 12 studies, respectively. Distributions of included studies (n=74) are shown with bar diagrams by (b) publication year, (c) publication type, and (d) training paradigm. Distributions of DP-SGD studies (n=67) are shown with bar diagrams by (e) data modality, (f) downstream tasks, and (g) architecture type. Some studies used multiple modalities, downstream tasks, or architectures; all such instances were counted, so categories are not mutually exclusive. Bar heights indicate counts. FL: federated learning; MLP: multilayer perceptron; CNN: convolution neural networks; GAN: generative adversarial network; GNN: graph neural network; MLP: multilayer perceptron; ViT: vision transformer.
Figure 4: Integrating differential privacy (DP) into the lifecycle of medical deep learning models. (a) Institutions curate data and remove direct identifiers or metadata before training begins. (b) DP can be applied in multiple configurations, including centralized DP, where all data are stored on a single server and DP-SGD is applied during training, federated learning (FL) with server-side DP, where institutions keep data locally, send model updates to a central server, and noise is added at the server before aggregation, and FL with local DP, where noise is added at each institution before sharing updates, providing stronger protection against an untrusted server. (c) The trained model is distributed to hospitals or clinical sites for inference. (d) After deployment, model behavior is monitored for privacy and security risks, including anomaly detection, privacy-leakage checks (e.g., membership inference signals), and tracking cumulative privacy budget when repeated queries or fine-tuning occur. Representative chest X-ray images are provided by the ChestX-ray14 dataset from NIH Clinical Center wang2017chestx.

Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications

TL;DR

Abstract

Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)