Table of Contents
Fetching ...

Toward effective protection against diffusion based mimicry through score distillation

Haotian Xue, Chumeng Liang, Xiaoyu Wu, Yongxin Chen

TL;DR

The paper analyzes diffusion-based mimicry risks and identifies the image encoder in latent diffusion models as the primary attack bottleneck. It introduces Score Distillation Sampling (SDS) to accelerate protection by focusing on latent-space perturbations, and shows that gradient descent on semantic loss can yield natural perturbations with effective protection. Extensive experiments across SDEdit, inpainting, and textual inversion demonstrate substantial resource savings (about 50% reduction in computation/memory) without sacrificing defense strength, and reveal strong transferability across LDM backbones. The work offers a practical, plug-and-play defense framework for end users to shield images from diffusion-based mimicry, contributing to more secure AI systems.

Abstract

While generative diffusion models excel in producing high-quality images, they can also be misused to mimic authorized images, posing a significant threat to AI systems. Efforts have been made to add calibrated perturbations to protect images from diffusion-based mimicry pipelines. However, most of the existing methods are too ineffective and even impractical to be used by individual users due to their high computation and memory requirements. In this work, we present novel findings on attacking latent diffusion models (LDM) and propose new plug-and-play strategies for more effective protection. In particular, we explore the bottleneck in attacking an LDM, discovering that the encoder module rather than the denoiser module is the vulnerable point. Based on this insight, we present our strategy using Score Distillation Sampling (SDS) to double the speed of protection and reduce memory occupation by half without compromising its strength. Additionally, we provide a robust protection strategy by counterintuitively minimizing the semantic loss, which can assist in generating more natural perturbations. Finally, we conduct extensive experiments to substantiate our findings and comprehensively evaluate our newly proposed strategies. We hope our insights and protective measures can contribute to better defense against malicious diffusion-based mimicry, advancing the development of secure AI systems. The code is available in https://github.com/xavihart/Diff-Protect

Toward effective protection against diffusion based mimicry through score distillation

TL;DR

The paper analyzes diffusion-based mimicry risks and identifies the image encoder in latent diffusion models as the primary attack bottleneck. It introduces Score Distillation Sampling (SDS) to accelerate protection by focusing on latent-space perturbations, and shows that gradient descent on semantic loss can yield natural perturbations with effective protection. Extensive experiments across SDEdit, inpainting, and textual inversion demonstrate substantial resource savings (about 50% reduction in computation/memory) without sacrificing defense strength, and reveal strong transferability across LDM backbones. The work offers a practical, plug-and-play defense framework for end users to shield images from diffusion-based mimicry, contributing to more secure AI systems.

Abstract

While generative diffusion models excel in producing high-quality images, they can also be misused to mimic authorized images, posing a significant threat to AI systems. Efforts have been made to add calibrated perturbations to protect images from diffusion-based mimicry pipelines. However, most of the existing methods are too ineffective and even impractical to be used by individual users due to their high computation and memory requirements. In this work, we present novel findings on attacking latent diffusion models (LDM) and propose new plug-and-play strategies for more effective protection. In particular, we explore the bottleneck in attacking an LDM, discovering that the encoder module rather than the denoiser module is the vulnerable point. Based on this insight, we present our strategy using Score Distillation Sampling (SDS) to double the speed of protection and reduce memory occupation by half without compromising its strength. Additionally, we provide a robust protection strategy by counterintuitively minimizing the semantic loss, which can assist in generating more natural perturbations. Finally, we conduct extensive experiments to substantiate our findings and comprehensively evaluate our newly proposed strategies. We hope our insights and protective measures can contribute to better defense against malicious diffusion-based mimicry, advancing the development of secure AI systems. The code is available in https://github.com/xavihart/Diff-Protect
Paper Structure (33 sections, 9 equations, 19 figures, 3 tables)

This paper contains 33 sections, 9 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: What Should We Focus On When Protecting Against Diffusion-based Mimicry? (a) Generating adversarial samples for LDMs is expensive with over 100 steps of backpropagation over denoiser $\epsilon_{\theta}$. The gradient of the denoiser tends to be really weak and unstable, compared with the strong gradient attacking the encoder, showing that $\epsilon_{\theta}$ is much more robust than the encoder $\mathcal{E}_{\phi}$. (b) After the PGD-iterations, the latent $z$-space has a much larger perturbation than the $x$-space, indicating $\mathcal{E}_{\phi}$ accounts for the effectiveness of the attack. (c, d) Our proposed design space, with much better efficiency and flexibility against three kinds of mimicry.
  • Figure 2: The $z$-Space of LDM is Vulnerable: here we show that the $z$-space exhibit significantly greater magnitude than the $x$-space after the protection. It is a common phenomenon for current protection methods: we show statistical results on (a) AdvDMliang2023adversarial, (b) PhotoGuardsalman2023raising and (c) Mistliang2023mist. The above histogram of each method demonstrates the distribution of $\delta_z / \delta_x$ across the four domains of the dataset (anime, artwork, landscape, and portrait), where both $\delta_z$ and $\delta_x$ are computed using the $\ell_{\infty}$ norm and are subsequently normalized. In the lower part of each method's illustration, we provide visual representations of the original image $x$, the protected image $x_{adv}$, and their latents $z$ and $z_{adv}$ respectively.
  • Figure 3: Perturbations in Latent Space Reflect the Editing Results: when $x_{adv}$ is generated, we have $\text{Edit}_{\phi, \theta, \psi}(x_{adv}, t)$ highly reflected by $\mathcal{D}_{\psi}(E_{\phi}(x_{adv}))$: sharing similarly unrealistic patterns such as bluring, colorful pattern or target pattern. This further proves that the changes in the $z$-space dominate the attack.
  • Figure 4: Directly Attacking the Latent Space Does not Work: here we show attacks in the latent space with $\ell_{\infty}$ budget of $0.5$ (normalized, nearly $10$-times larger budget as in $x$-space), running PGD attacks by sampling timestep $t$, we find that after the attack, the predicted noise is still reasonable, which means that the attack did not fool the denoiser that much.
  • Figure 5: Minimizing Semantic Loss Brings More Natural Protection: [Left] We show the attacked images $x_{adv}$(+) using gradient ascent (red boundary), $x_{adv}$(-) with gradient descent (green boundary), and their perturbations $\delta_{adv}$(+), $\delta_{adv}$(-). [Right] The SDEdit results over the two kinds of protected images, with increasing strength $\text{Edit}(i, ii, iii)$. Zoom in on a computer screen for better visualization.
  • ...and 14 more figures