Table of Contents
Fetching ...

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Chanhui Lee, Seunghyun Shin, Donggyu Choi, Hae-gon Jeon, Jeany Son

TL;DR

This paper proposes the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines, Inspired by universal adversarial perturbation techniques used in targeted attacks.

Abstract

Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

TL;DR

This paper proposes the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines, Inspired by universal adversarial perturbation techniques used in targeted attacks.

Abstract

Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.
Paper Structure (58 sections, 9 equations, 21 figures, 25 tables, 3 algorithms)

This paper contains 58 sections, 9 equations, 21 figures, 25 tables, 3 algorithms.

Figures (21)

  • Figure 1: Illustrations of our universal immunization approach and the motivation behind semantic injection. (a) Unlike image-specific approaches (top) requiring costly per-image processing, our universal immunization method (bottom) employs a single, pre-computed UAP to safeguard images without any inference-time overhead. (b) Injecting the target content into source images via a UAP misleads the editing model, causing it to lose the original content and resulting in failed edits. Note that perturbations are scaled for better visualization of details.
  • Figure 2: Visualization of cross-attention maps from the diffusion model under different conditions. (a) Target images generated by Stable Diffusion rombach2022high conditioned on the prompt 'Ronaldo', along with the generated corresponding UAP. (b) Source images show strong attention to their own content, but since they do not intrinsically contain the target semantics ('Ronaldo'), they do not produce any target-aligned attention. (c) Immunized images, perturbed by the UAP, fail to focus on the original prompt and instead exhibit strong attention to the target concept 'Ronaldo'. Each attention map is annotated with the prompt used for conditioning.
  • Figure 3: Qualitative comparison with universal baselines and image-specific methods. 'Enc.', 'Emb.', and 'Map' represent the universal baselines, obtained by adapting image-specific methods: encoder attack raising, intermediate representation disruption (e.g., query/key/value embeddings) advpaint, and attention map attack semantic, respectively. 'EA', 'DA', and 'SA' indicate image-specific immunization methods: Encoder-, Decoder- raising, and Semantic-Attack semantic. We use 'Ronaldo' as the target prompt for UAP generation.
  • Figure 4: Examples of our generated evaluation dataset.
  • Figure 5: Visualization for targets and their corresponding UAPs. UAPs are scaled for visualization.
  • ...and 16 more figures