Table of Contents
Fetching ...

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

Zhengyue Zhao, Jinhao Duan, Kaidi Xu, Chenan Wang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu

TL;DR

This work critically evaluates protective perturbations designed to prevent Stable Diffusion from exploiting personal data during fine-tuning under a realistic threat model. It demonstrates that protection is fragile in practice, being highly sensitive to fine-tuning methods, the fraction of protected data, and common natural transformations. The authors introduce GrIDPure, a grid-based purification framework that preserves high-resolution image structure while removing protective perturbations, outperforming prior DiffPure approaches in preserving quality. However, purified images can still be learned by Stable Diffusion, underscoring the limited efficacy of perturbation-based defenses and the need for stronger privacy-preserving mechanisms. Overall, the study provides a practical framework for evaluating protections and offers GrIDPure as a more robust purification tool, while highlighting ongoing challenges in safeguarding visual data against advanced generative systems.

Abstract

Stable Diffusion has established itself as a foundation model in generative AI artistic applications, receiving widespread research and application. Some recent fine-tuning methods have made it feasible for individuals to implant personalized concepts onto the basic Stable Diffusion model with minimal computational costs on small datasets. However, these innovations have also given rise to issues like facial privacy forgery and artistic copyright infringement. In recent studies, researchers have explored the addition of imperceptible adversarial perturbations to images to prevent potential unauthorized exploitation and infringements when personal data is used for fine-tuning Stable Diffusion. Although these studies have demonstrated the ability to protect images, it is essential to consider that these methods may not be entirely applicable in real-world scenarios. In this paper, we systematically evaluate the use of perturbations to protect images within a practical threat model. The results suggest that these approaches may not be sufficient to safeguard image privacy and copyright effectively. Furthermore, we introduce a purification method capable of removing protected perturbations while preserving the original image structure to the greatest extent possible. Experiments reveal that Stable Diffusion can effectively learn from purified images over all protective methods.

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

TL;DR

This work critically evaluates protective perturbations designed to prevent Stable Diffusion from exploiting personal data during fine-tuning under a realistic threat model. It demonstrates that protection is fragile in practice, being highly sensitive to fine-tuning methods, the fraction of protected data, and common natural transformations. The authors introduce GrIDPure, a grid-based purification framework that preserves high-resolution image structure while removing protective perturbations, outperforming prior DiffPure approaches in preserving quality. However, purified images can still be learned by Stable Diffusion, underscoring the limited efficacy of perturbation-based defenses and the need for stronger privacy-preserving mechanisms. Overall, the study provides a practical framework for evaluating protections and offers GrIDPure as a more robust purification tool, while highlighting ongoing challenges in safeguarding visual data against advanced generative systems.

Abstract

Stable Diffusion has established itself as a foundation model in generative AI artistic applications, receiving widespread research and application. Some recent fine-tuning methods have made it feasible for individuals to implant personalized concepts onto the basic Stable Diffusion model with minimal computational costs on small datasets. However, these innovations have also given rise to issues like facial privacy forgery and artistic copyright infringement. In recent studies, researchers have explored the addition of imperceptible adversarial perturbations to images to prevent potential unauthorized exploitation and infringements when personal data is used for fine-tuning Stable Diffusion. Although these studies have demonstrated the ability to protect images, it is essential to consider that these methods may not be entirely applicable in real-world scenarios. In this paper, we systematically evaluate the use of perturbations to protect images within a practical threat model. The results suggest that these approaches may not be sufficient to safeguard image privacy and copyright effectively. Furthermore, we introduce a purification method capable of removing protected perturbations while preserving the original image structure to the greatest extent possible. Experiments reveal that Stable Diffusion can effectively learn from purified images over all protective methods.
Paper Structure (58 sections, 10 equations, 29 figures, 8 tables, 2 algorithms)

This paper contains 58 sections, 10 equations, 29 figures, 8 tables, 2 algorithms.

Figures (29)

  • Figure 1: Overview of protective perturbation and failed protection facing exploitation of Stable Diffusion models.
  • Figure 2: Visualization of protective effectiveness of Anti-DreamBooth toward different fine-tuning methods on the CelebA-HQ dataset with prompt "a photo of a sks person".
  • Figure 3: Visualization of protective effectiveness of AdvDM toward different fine-tuning methods on the WikiArt dataset with prompt "a painting in the style of Monet".
  • Figure 4: Average changes of parameters ($\Delta\theta$) of different layers in the Stable Diffusion fine-tuned with clean and protected images.
  • Figure 5: Protective effectiveness of AdvDM and Anti-DreamBooth under different protective ratios of the CelebA-HQ dataset.
  • ...and 24 more figures