Table of Contents
Fetching ...

Medical Unlearnable Examples: Securing Medical Data from Unauthorized Training via Sparsity-Aware Local Masking

Weixiang Sun, Yixin Liu, Zhiling Yan, Kaidi Xu, Lichao Sun

TL;DR

The Sparsity-Aware Local Masking (SALM) method is proposed, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previously and significantly narrows down the search space for disturbances and fully leverages the characteristics of sparsity.

Abstract

The rapid expansion of AI in healthcare has led to a surge in medical data generation and storage, boosting medical AI development. However, fears of unauthorized use, like training commercial AI models, hinder researchers from sharing their valuable datasets. To encourage data sharing, one promising solution is to introduce imperceptible noise into the data. This method aims to safeguard the data against unauthorized training by inducing degradation in the generalization ability of the trained model. However, they are not effective and efficient when applied to medical data, mainly due to the ignorance of the sparse nature of medical images. To address this problem, we propose the Sparsity-Aware Local Masking (SALM) method, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previously. This simple yet effective approach, by focusing on local areas, significantly narrows down the search space for disturbances and fully leverages the characteristics of sparsity. Our extensive experiments across various datasets and model architectures demonstrate that SALM effectively prevents unauthorized training of different models and outperforms previous SoTA data protection methods.

Medical Unlearnable Examples: Securing Medical Data from Unauthorized Training via Sparsity-Aware Local Masking

TL;DR

The Sparsity-Aware Local Masking (SALM) method is proposed, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previously and significantly narrows down the search space for disturbances and fully leverages the characteristics of sparsity.

Abstract

The rapid expansion of AI in healthcare has led to a surge in medical data generation and storage, boosting medical AI development. However, fears of unauthorized use, like training commercial AI models, hinder researchers from sharing their valuable datasets. To encourage data sharing, one promising solution is to introduce imperceptible noise into the data. This method aims to safeguard the data against unauthorized training by inducing degradation in the generalization ability of the trained model. However, they are not effective and efficient when applied to medical data, mainly due to the ignorance of the sparse nature of medical images. To address this problem, we propose the Sparsity-Aware Local Masking (SALM) method, a novel approach that selectively perturbs significant pixel regions rather than the entire image as previously. This simple yet effective approach, by focusing on local areas, significantly narrows down the search space for disturbances and fully leverages the characteristics of sparsity. Our extensive experiments across various datasets and model architectures demonstrate that SALM effectively prevents unauthorized training of different models and outperforms previous SoTA data protection methods.
Paper Structure (16 sections, 6 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Our SALM method comprises a comprehensive framework that encompasses two primary steps: important pixel acquisition and noise generator training. In the first phase, the model calculates the gradient at each pixel within the image and ranks them, generating a sparse mask through a pre-set $K$ value. In the second phase, the noise generator focuses on perturbing the pixels selected in the previous step and updates its parameters. By implementing this noise, models trained without authorization exhibit poor performance on clean datasets. Conversely, the performance for authorized users remains comparable to that achieved with the original data.
  • Figure 2: The learning curves of ResNet-18 trained on different protected data.
  • Figure 3: The selected categories protect effectiveness under different models.
  • Figure 4: The effect of $K$ on clean test accuracy(%) for the four datasets.
  • Figure 5: The protected framework in the case of the class-combined dataset. The noise generated for the corresponding obtained class from data combined with other support classes still retains its protective effect.
  • ...and 2 more figures