Table of Contents
Fetching ...

Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version)

Lucas Lange, Maja Schneider, Peter Christen, Erhard Rahm

TL;DR

This work tackles privacy in COVID-19 X-ray classification by training models under Differential Privacy with DP-SGD, addressing data imbalance and evaluating a range of privacy budgets. It couples architecture and pre-training choices (e.g., tanh activations, pneumonia pre-training) with an empirical privacy assessment using black-box MIAs to gauge practical privacy leakage. The key finding is that increasing the DP guarantee yields only marginal improvements in empirical leakage, and the practicality of DP is task-dependent, with MIAs plateauing across budgets. The authors advocate for attack-aware, empirical privacy analyses to tune utility-privacy trade-offs and suggest that DP should be tuned rather than maximized blindly to preserve model utility while maintaining credible privacy guarantees.

Abstract

Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.

Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version)

TL;DR

This work tackles privacy in COVID-19 X-ray classification by training models under Differential Privacy with DP-SGD, addressing data imbalance and evaluating a range of privacy budgets. It couples architecture and pre-training choices (e.g., tanh activations, pneumonia pre-training) with an empirical privacy assessment using black-box MIAs to gauge practical privacy leakage. The key finding is that increasing the DP guarantee yields only marginal improvements in empirical leakage, and the practicality of DP is task-dependent, with MIAs plateauing across budgets. The authors advocate for attack-aware, empirical privacy analyses to tune utility-privacy trade-offs and suggest that DP should be tuned rather than maximized blindly to preserve model utility while maintaining credible privacy guarantees.

Abstract

Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.
Paper Structure (31 sections, 1 theorem, 9 equations, 2 figures, 4 tables)

This paper contains 31 sections, 1 theorem, 9 equations, 2 figures, 4 tables.

Key Result

Theorem 1

Let $A$ be an ($\varepsilon\xspace$,$\delta$)-differentially private learning algorithm, $\mathcal{A}$ be a membership adversary, $\mathsf{Adv}^\mathsf{M}$ the membership advantage of $\mathcal{A}$, n be a positive integer, and D be a distribution over data points (x, y). Then we have:

Figures (2)

  • Figure 1: Chest X-ray images of different patients extracted from the COVID-19 Radiography Database chowdhury2020canrahman2021exploring. COVID-19 positive scans are characterized by patchy consolidations of the lungs.
  • Figure 2: Empirical privacy leakage results from MIAs are given as our 95% CI membership advantage ($\mathsf{Adv}^\mathsf{M}$) and plotted across the different privacy budgets. Model variants can be distinguished with the legend. We exclude data points with <50% F1 because low performance disproportionately reduces leakage. A dotted line shows the DP bound from yeom2018privacy.

Theorems & Definitions (2)

  • Theorem 1
  • proof