Table of Contents
Fetching ...

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think

Haotian Xue, Yongxin Chen

TL;DR

This work shows that pixel-space diffusion models (PDMs) exhibit far stronger adversarial robustness than latent-space diffusion models (LDMs), challenging the prevailing view that diffusion models are uniformly vulnerable. Through extensive experiments across multiple LDMs and PDMs, the authors demonstrate that gradient-based white-box attacks designed for LDMs fail to meaningfully attack PDMs, due to the denoising dynamics in pixel space. Building on this, they introduce PDM-Pure, a simple SDEdit-based purifier that can remove protective perturbations across a range of attack methods and image resolutions, effectively restoring editability and imitation capabilities. The findings imply that pixel-space diffusion processes constitute a fundamental barrier to adversarial manipulation, complicating current protection strategies and suggesting new directions for robust defense design. Overall, the paper reframes adversarial samples in diffusion models and proposes a practical universal purifier with broad implications for diffusion-based safety and security.

Abstract

Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adversarial patterns that were generated on LDMs to protect the images, which means that most protection methods nowadays, to some extent, cannot protect our images from malicious attacks. We hope that our insights will inspire the community to rethink the adversarial samples for diffusion models as protection methods and move forward to more effective protection. Codes are available in https://github.com/xavihart/PDM-Pure.

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think

TL;DR

This work shows that pixel-space diffusion models (PDMs) exhibit far stronger adversarial robustness than latent-space diffusion models (LDMs), challenging the prevailing view that diffusion models are uniformly vulnerable. Through extensive experiments across multiple LDMs and PDMs, the authors demonstrate that gradient-based white-box attacks designed for LDMs fail to meaningfully attack PDMs, due to the denoising dynamics in pixel space. Building on this, they introduce PDM-Pure, a simple SDEdit-based purifier that can remove protective perturbations across a range of attack methods and image resolutions, effectively restoring editability and imitation capabilities. The findings imply that pixel-space diffusion processes constitute a fundamental barrier to adversarial manipulation, complicating current protection strategies and suggesting new directions for robust defense design. Overall, the paper reframes adversarial samples in diffusion models and proposes a practical universal purifier with broad implications for diffusion-based safety and security.

Abstract

Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adversarial patterns that were generated on LDMs to protect the images, which means that most protection methods nowadays, to some extent, cannot protect our images from malicious attacks. We hope that our insights will inspire the community to rethink the adversarial samples for diffusion models as protection methods and move forward to more effective protection. Codes are available in https://github.com/xavihart/PDM-Pure.
Paper Structure (44 sections, 10 equations, 12 figures, 2 tables)

This paper contains 44 sections, 10 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Pixel is a Barrier for Attacking DMs: (a) Pixel-based diffusion models are harder to attack using white-box attacks like project-gradient-descent than diffusion models in the latent space. (b) Strong PDM can be used as a universal purifier to effectively remove the protective perturbation generated by existing protection methods. (c) Pixel is a barrier and the pixel-space diffusion model is quite robust, and we cannot achieve real safety and protection if pixel-space diffusion is not attacked.
  • Figure 2: PDMs Cannot be Attacked as LDMs: (a) LDMs can be easily fooled but PDMs cannot be. (b) Even End-to-End attack does not work on PDMs. (Best viewed with zoom-in)
  • Figure 3: PDM-Pure is Easy to Design: (a) PDM-Pure applies SDEdit meng2021sdedit in the pixel space: it first runs forward diffusion with a small step $t^{*}$ and then runs denoising process. (b) We adapt the framework to DeepFloyd-IF deepfloyd, one of the strongest PDMs. PDM-Pure can effectively remove strong protective perturbations (e.g. $\delta=16/255$). The images we tested are sized $512\times 512$.
  • Figure 4: PDM-Pure makes the Protected Images no more Protected: Here we show qualitative results of PDM-Pure on three scenarios where unauthorized editing may occur: (a) Inpainting, (b) Text-Inversion textualinversion and (c) LoRA customization lora. While the protected images incur bad generation quality, the purified ones can fully bypass the protection.
  • Figure 5: PDMs cannot be Attacked as LDMs: we conduct experiments on various models with various budgets, even the largest budget will not affect the PDMs, showing that PDMs are adversarially robust. For each block, the first column is the attacked image, and the second and third columns are edited images, where the third column adopts larger editing strength.
  • ...and 7 more figures