Table of Contents
Fetching ...

Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

Chun-Yen Shih, Li-Xuan Peng, Jia-Wei Liao, Ernie Chu, Cheng-Fu Chou, Jun-Cheng Chen

TL;DR

This work tackles the risk of malicious diffusion-based image editing by developing AtkPDM, a pixel-domain attack against Pixel-domain Diffusion Models (PDMs). It introduces a feature-attacking loss operating on denoising UNet representations and a fidelity constraint, complemented by latent optimization via a pretrained VAE to preserve image naturalness, formulated and solved with alternating optimization. The approach achieves state-of-the-art attack performance on PDMs (and transfers to LDMs), while remaining robust to common defenses such as purification, cropping, and JPEG compression. This reveals a vulnerabilities in UNet-based diffusion models and provides a practical image-protection mechanism against diffusion-based editing, with potential implications for safety and IP protection in visual content.

Abstract

Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attack framework, AtkPDM. AtkPDM is mainly composed of a feature representation attacking loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images. Extensive experiments demonstrate the effectiveness of our approach in attacking dominant PDM-based editing methods (e.g., SDEdit) while maintaining reasonable fidelity and robustness against common defense methods. Additionally, our framework is extensible to LDMs, achieving comparable performance to existing approaches.

Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

TL;DR

This work tackles the risk of malicious diffusion-based image editing by developing AtkPDM, a pixel-domain attack against Pixel-domain Diffusion Models (PDMs). It introduces a feature-attacking loss operating on denoising UNet representations and a fidelity constraint, complemented by latent optimization via a pretrained VAE to preserve image naturalness, formulated and solved with alternating optimization. The approach achieves state-of-the-art attack performance on PDMs (and transfers to LDMs), while remaining robust to common defenses such as purification, cropping, and JPEG compression. This reveals a vulnerabilities in UNet-based diffusion models and provides a practical image-protection mechanism against diffusion-based editing, with potential implications for safety and IP protection in visual content.

Abstract

Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attack framework, AtkPDM. AtkPDM is mainly composed of a feature representation attacking loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images. Extensive experiments demonstrate the effectiveness of our approach in attacking dominant PDM-based editing methods (e.g., SDEdit) while maintaining reasonable fidelity and robustness against common defense methods. Additionally, our framework is extensible to LDMs, achieving comparable performance to existing approaches.
Paper Structure (45 sections, 20 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 45 sections, 20 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of our attack scenario. Diffusion-based image editing can generate high-quality image variation based on the clean input image. However, by adding carefully crafted perturbation to the clean image, the diffusion process will be disrupted, producing a corrupted image or unrelated image semantics to the original image.
  • Figure 2: Conceptual illustration of our method. We randomly forward both the clean image $\mathbf{x}$ and adversarial image $\mathbf{x}^{\operatornamewithlimits{adv}}$ to noise level $t$, then utilize our feature attacking loss to maximize the feature distance between noisy latent $\mathbf{x}_t$ and $\mathbf{x}^{\operatornamewithlimits{adv}}_t$ in the reverse process of diffusion models while imposing our fidelity loss as a constraint to ensure the adversarial image from being deviated from the original image. We update the $\mathbf{x}^{\operatornamewithlimits{adv}}$ in latent space instead of in pixel space to ensure the naturalness of $\mathbf{x}^{\operatornamewithlimits{adv}}$.
  • Figure 3: Overview of our AtkPDM$^{+}$ algorithm: Starting from the latent, $\mathbf{z}^{\operatornamewithlimits{adv}}$, of the initial adversarial image, we first decode back to pixel-domain to perform forward diffusion with both $\mathbf{x}$ and $\mathbf{x}^{\operatornamewithlimits{adv}}$ and feed them to frozen victim UNet. We then extract the feature representation of the middle block in UNet to calculate our $\mathcal{L}_\text{attack}$, aiming to distract the recognition of image semantics. We also calculate our $\mathcal{L}_\text{fidelity}$ in pixel-domain to constrain the optimization. Finally, the $\mathbf{z}^{\operatornamewithlimits{adv}}$ is being alternatively updated by loss gradients.
  • Figure 4: Qualitative results compared to the previous methods. Our adversarial images can effectively corrupt the edited results without significant fidelity decrease. The same column shares the same random seed for fair comparisons.
  • Figure 5: Loss curves of our $\mathcal{L}_\text{attack}$ and $\mathcal{L}_\text{fidelity}$ against optimization step.
  • ...and 4 more figures