Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation

Adil Koeken; Alexander Ziller; Moritz Knolle; Daniel Rueckert

Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation

Adil Koeken, Alexander Ziller, Moritz Knolle, Daniel Rueckert

TL;DR

The paper critiques post-hoc privacy filters for synthetic medical data, showing that filters trained to flag samples similar to training data exhibit high apparent sensitivity for real images but fail to detect near-duplicates generated by diffusion models and perform poorly on unseen data. By adapting both latent-space and pixel-space privacy filters and evaluating them with a rigorous protocol across sensitivity, specificity, and consistency, the study reveals substantial false positives, inconsistent decisions across seeds, and weak detection of leakage. The findings challenge the practical utility of current privacy-filter pipelines and highlight a pressing need for more reliable mechanisms to safeguard patient privacy in synthetic medical datasets. The work underlines that without stronger filter designs, post-hoc approaches may provide a false sense of security while leaving sensitive information exposed, limiting their deployment in clinical AI contexts.

Abstract

The generation of privacy-preserving synthetic datasets is a promising avenue for overcoming data scarcity in medical AI research. Post-hoc privacy filtering techniques, designed to remove samples containing personally identifiable information, have recently been proposed as a solution. However, their effectiveness remains largely unverified. This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis. Contrary to claims from the original publications, our results demonstrate that current filters exhibit limited specificity and consistency, achieving high sensitivity only for real images while failing to reliably detect near-duplicates generated from training data. These results demonstrate a critical limitation of post-hoc filtering: rather than effectively safeguarding patient privacy, these methods may provide a false sense of security while leaving unacceptable levels of patient information exposed. We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.

Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation

TL;DR

Abstract

Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)