Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

Qianyu Guo; Jiaming Fu; Yawen Lu; Dongming Gan

Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

Qianyu Guo, Jiaming Fu, Yawen Lu, Dongming Gan

TL;DR

The paper tackles VR adversarial security by addressing the conspicuousness of typical attack visuals. It introduces Diffusion Attack, a diffusion-based framework that combines neural style transfer with a latent diffusion model to produce natural-looking adversarial images, using a mask to constrain edits and a joint loss $l_{total}$ that fuses $l_{content}$, $l_{style}$, $l_{adv}$, and $l_{smooth}$. By leveraging text-to-image prompts with Stable Diffusion and optimizing for targeted misclassification against classifiers such as Inception V3, the approach achieves high perceptual quality as measured by non-reference IQA metrics like NR-IQA and related aesthetics scores. The work demonstrates that naturalistic adversarial examples can preserve semantic integrity while maintaining strong attack efficacy, highlighting implications for VR security and the need for robust defense against style-transfer-based attacks.

Abstract

In Virtual Reality (VR), adversarial attack remains a significant security threat. Most deep learning-based methods for physical and digital adversarial attacks focus on enhancing attack performance by crafting adversarial examples that contain large printable distortions that are easy for human observers to identify. However, attackers rarely impose limitations on the naturalness and comfort of the appearance of the generated attack image, resulting in a noticeable and unnatural attack. To address this challenge, we propose a framework to incorporate style transfer to craft adversarial inputs of natural styles that exhibit minimal detectability and maximum natural appearance, while maintaining superior attack capabilities.

Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

TL;DR

that fuses

, and

. By leveraging text-to-image prompts with Stable Diffusion and optimizing for targeted misclassification against classifiers such as Inception V3, the approach achieves high perceptual quality as measured by non-reference IQA metrics like NR-IQA and related aesthetics scores. The work demonstrates that naturalistic adversarial examples can preserve semantic integrity while maintaining strong attack efficacy, highlighting implications for VR security and the need for robust defense against style-transfer-based attacks.

Abstract

Paper Structure (4 sections, 1 equation, 3 figures, 1 table)

This paper contains 4 sections, 1 equation, 3 figures, 1 table.

Introduction
Diffusion Attack
Experiment and Preliminary Results
Conclusion

Figures (3)

Figure 1: Overview of the proposed Diffusion Attack. Starting from the text prompt to generate a style image, our adversarial attack model causes the style transferred image to be classified as a fake umbrella instead of the original T-shirt. This is in contrast to existing attack methods that invade the system with noticeable and unnatural noises.
Figure 2: Style transferred image results from different provided style images. Our Diffusion Attack has advantages in rendering realistic textures and coherent structures on the target object and region.
Figure 3: Attack performance shows the expected misclassification label and its probability. Diffusion Attack can generate images with various adversarial patterns combining different textures and colors.

Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

TL;DR

Abstract

Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

Authors

TL;DR

Abstract

Table of Contents

Figures (3)