Table of Contents
Fetching ...

GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors

Yaopei Zeng, Yuanpu Cao, Lu Lin

TL;DR

GuardDoor tackles the risk of unauthorized diffusion-based image editing by combining image-side protective triggers with a model-side backdoor. A pre-trained VAE generates imperceptible triggers $h_\phi(\cdot)$, and the diffusion encoder $f_\theta$ is fine-tuned to map trigger-containing images to a predefined meaningless output, while preserving normal editing for clean images, via the objective $\min_\theta \mathcal{L}(\theta)= \mathcal{L}(g_\psi(f_\theta(\bm x)), \bm x) + \alpha \mathcal{L}(f_\theta(h_\phi(\bm x)), f_\theta(\bm x_{\text{tar}}))$. This model-centric defense remains robust to preprocessing and scales to large datasets, outperforming prior perturbation-based methods under attacks like DiffPure and IMPRESS, with low runtime overhead. The approach enables practical, cooperative protection for digital content in the era of generative AI by ensuring protected images yield meaningless edits while unprotected ones are editable as usual. Future work may extend GuardDoor to more editing modalities, including masked edits, and to additional adversarial scenarios.

Abstract

The growing accessibility of diffusion models has revolutionized image editing but also raised significant concerns about unauthorized modifications, such as misinformation and plagiarism. Existing countermeasures largely rely on adversarial perturbations designed to disrupt diffusion model outputs. However, these approaches are found to be easily neutralized by simple image preprocessing techniques, such as compression and noise addition. To address this limitation, we propose GuardDoor, a novel and robust protection mechanism that fosters collaboration between image owners and model providers. Specifically, the model provider participating in the mechanism fine-tunes the image encoder to embed a protective backdoor, allowing image owners to request the attachment of imperceptible triggers to their images. When unauthorized users attempt to edit these protected images with this diffusion model, the model produces meaningless outputs, reducing the risk of malicious image editing. Our method demonstrates enhanced robustness against image preprocessing operations and is scalable for large-scale deployment. This work underscores the potential of cooperative frameworks between model providers and image owners to safeguard digital content in the era of generative AI.

GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors

TL;DR

GuardDoor tackles the risk of unauthorized diffusion-based image editing by combining image-side protective triggers with a model-side backdoor. A pre-trained VAE generates imperceptible triggers , and the diffusion encoder is fine-tuned to map trigger-containing images to a predefined meaningless output, while preserving normal editing for clean images, via the objective . This model-centric defense remains robust to preprocessing and scales to large datasets, outperforming prior perturbation-based methods under attacks like DiffPure and IMPRESS, with low runtime overhead. The approach enables practical, cooperative protection for digital content in the era of generative AI by ensuring protected images yield meaningless edits while unprotected ones are editable as usual. Future work may extend GuardDoor to more editing modalities, including masked edits, and to additional adversarial scenarios.

Abstract

The growing accessibility of diffusion models has revolutionized image editing but also raised significant concerns about unauthorized modifications, such as misinformation and plagiarism. Existing countermeasures largely rely on adversarial perturbations designed to disrupt diffusion model outputs. However, these approaches are found to be easily neutralized by simple image preprocessing techniques, such as compression and noise addition. To address this limitation, we propose GuardDoor, a novel and robust protection mechanism that fosters collaboration between image owners and model providers. Specifically, the model provider participating in the mechanism fine-tunes the image encoder to embed a protective backdoor, allowing image owners to request the attachment of imperceptible triggers to their images. When unauthorized users attempt to edit these protected images with this diffusion model, the model produces meaningless outputs, reducing the risk of malicious image editing. Our method demonstrates enhanced robustness against image preprocessing operations and is scalable for large-scale deployment. This work underscores the potential of cooperative frameworks between model providers and image owners to safeguard digital content in the era of generative AI.

Paper Structure

This paper contains 18 sections, 1 equation, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Demonstration of how GuardDoor safeguards image against unauthorized edits. Left: Unauthorized users may bypass adversarial perturbation-based protections with image preprocessing, successfully misusing public diffusion models for malicious editing. Right: By collaborating with model providers, GuardDoor embeds protective triggers into images and injects protective backdoors into diffusion models, ensuring that protection remains effective even after image preprocessing.
  • Figure 2: Overview of GuardDoor's protection mechanism: The process begins with the generation of protective triggers through a pre-trained VAE. During model fine-tuning, the image encoder of the diffusion model is trained to associate these triggers with a predefined output, such as a black image, while a utility loss ensures the encoder maintains its functionality for clean images. At the inference stage, protected images are processed by unauthorized users attempting edits, but the embedded triggers activate the protective backdoor, neutralizing edits by producing meaningless outputs.
  • Figure 3: Qualitative comparison of different protection methods under various attack scenarios. GuardDoor's protective triggers remain imperceptible while effectively preventing unauthorized edits. In contrast, PhotoGuard begins to fail under attacks like DiffPure and JPEG compression, allowing the diffusion model to generate outputs resembling the original image.
  • Figure 4: Instructions provided to GPT-4o for evaluating the effectiveness of different protection methods. The model scores the protected images based on similarity to the original image and overall quality of the generated output.
  • Figure 5: Visualization of the noise pattern introduced by GuardDoor. From left to right: (1) Original image, (2) Protected image, (3) Noise pattern, and (4) Noise pattern after preprocessing. The noise pattern is imperceptible and remains effective even after common preprocessing techniques, ensuring robustness against unauthorized image modifications.