Table of Contents
Fetching ...

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

Richard Osuala, Daniel M. Lang, Anneliese Riess, Georgios Kaissis, Zuzanna Szafranowska, Grzegorz Skorupko, Oliver Diaz, Julia A. Schnabel, Karim Lekadir

TL;DR

This work tackles privacy concerns in deep learning for breast cancer detection from mammography by evaluating two privacy preserving strategies: differentially private stochastic gradient descent (DP-SGD) and training on synthetic data generated by a malignancy conditioned GAN (MCGAN). A Swin Transformer classifier is used to perform mass malignancy classification on patch-based mammography data, with rigorous experiments across internal and external test sets. The study demonstrates that synthetic data can improve privacy-utility tradeoffs under DP constraints and that pretraining on synthetic data followed by DP fine-tuning yields robust gains, achieving up to about 0.74 in AUPRC under a private setting. Overall, the results support the clinical viability of private high-utility deep diagnostic models and highlight directions for sharpening privacy guarantees through DP training of generators and diffusion-based synthesis.

Abstract

Deep learning holds immense promise for aiding radiologists in breast cancer detection. However, achieving optimal model performance is hampered by limitations in availability and sharing of data commonly associated to patient privacy concerns. Such concerns are further exacerbated, as traditional deep learning models can inadvertently leak sensitive training information. This work addresses these challenges exploring and quantifying the utility of privacy-preserving deep learning techniques, concretely, (i) differentially private stochastic gradient descent (DP-SGD) and (ii) fully synthetic training data generated by our proposed malignancy-conditioned generative adversarial network. We assess these methods via downstream malignancy classification of mammography masses using a transformer model. Our experimental results depict that synthetic data augmentation can improve privacy-utility tradeoffs in differentially private model training. Further, model pretraining on synthetic data achieves remarkable performance, which can be further increased with DP-SGD fine-tuning across all privacy guarantees. With this first in-depth exploration of privacy-preserving deep learning in breast imaging, we address current and emerging clinical privacy requirements and pave the way towards the adoption of private high-utility deep diagnostic models. Our reproducible codebase is publicly available at https://github.com/RichardObi/mammo_dp.

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

TL;DR

This work tackles privacy concerns in deep learning for breast cancer detection from mammography by evaluating two privacy preserving strategies: differentially private stochastic gradient descent (DP-SGD) and training on synthetic data generated by a malignancy conditioned GAN (MCGAN). A Swin Transformer classifier is used to perform mass malignancy classification on patch-based mammography data, with rigorous experiments across internal and external test sets. The study demonstrates that synthetic data can improve privacy-utility tradeoffs under DP constraints and that pretraining on synthetic data followed by DP fine-tuning yields robust gains, achieving up to about 0.74 in AUPRC under a private setting. Overall, the results support the clinical viability of private high-utility deep diagnostic models and highlight directions for sharpening privacy guarantees through DP training of generators and diffusion-based synthesis.

Abstract

Deep learning holds immense promise for aiding radiologists in breast cancer detection. However, achieving optimal model performance is hampered by limitations in availability and sharing of data commonly associated to patient privacy concerns. Such concerns are further exacerbated, as traditional deep learning models can inadvertently leak sensitive training information. This work addresses these challenges exploring and quantifying the utility of privacy-preserving deep learning techniques, concretely, (i) differentially private stochastic gradient descent (DP-SGD) and (ii) fully synthetic training data generated by our proposed malignancy-conditioned generative adversarial network. We assess these methods via downstream malignancy classification of mammography masses using a transformer model. Our experimental results depict that synthetic data augmentation can improve privacy-utility tradeoffs in differentially private model training. Further, model pretraining on synthetic data achieves remarkable performance, which can be further increased with DP-SGD fine-tuning across all privacy guarantees. With this first in-depth exploration of privacy-preserving deep learning in breast imaging, we address current and emerging clinical privacy requirements and pave the way towards the adoption of private high-utility deep diagnostic models. Our reproducible codebase is publicly available at https://github.com/RichardObi/mammo_dp.
Paper Structure (8 sections, 2 equations, 2 figures, 1 table)

This paper contains 8 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of our privacy-preserving deep learning pipeline and malignancy-conditioned generative adversarial network (MCGAN).
  • Figure 2: Qualitative and quantitative synthesis results: Images are randomly selected malignant and benign real (CBIS-DDSM Lee2017) and MCGAN-generated masses. ImageNet deng2009imagenet and RadImageNet osuala2023mediganmei2022radimagenet based FID heusel2017gans and FRD osuala2024towards scores are reported as mean $\pm$ standard deviation based on 3 subsets randomly sampled per patient (N$_{\mathrm{real}}\approx360$, N$_{\mathrm{syn}}\approx3240$). Row 4 indicates an BCDR-basedlopez2012bcdr upper bound for comparison with synthetic data metrics in row 1.