DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection
Sihan Shang, Jiancheng Yang, Zhenglong Sun, Pascal Fua
TL;DR
DataCook introduces deployment-time protection for healthcare data by transforming raw data into protected forms via Anti-Adversarial Examples, preserving usability for authorized users while hindering unauthorized model training. The approach formalizes an optimization that minimizes the protected-model error on raw data under SSIM-based perceptual constraints between raw and protected inputs, and leverages a surrogate model to generate anti-adversarial perturbations with a pseudo-label objective. Extensive experiments on MedMNIST across 2D, 3D, and high-resolution variants show that AntiAdv perturbations outperform random noise and maintain legitimate accuracy, with results validated across multiple architectures and dataset subsets. The findings demonstrate DataCook’s practical potential for copyright protection in healthcare data, preserving data utility for authorized deployment while reducing leakage risk, and suggest directions for further refinement and application to broader privacy-preserving goals.
Abstract
In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by "cooking" the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also "cooked" through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.
