Table of Contents
Fetching ...

Attack as Defense: Run-time Backdoor Implantation for Image Content Protection

Haichuan Zhang, Meiyu Lin, Zhaoyi Liu, Renyuan Li, Zhiyuan Cheng, Carl Yang, Mingjie Tang

TL;DR

This work innovatively prevent the abuse of image content modification by implanting the backdoor into image-editing models through the first framework for run-time backdoor implantation, which is both time- and resource- efficient.

Abstract

As generative models achieve great success, tampering and modifying the sensitive image contents (i.e., human faces, artist signatures, commercial logos, etc.) have induced a significant threat with social impact. The backdoor attack is a method that implants vulnerabilities in a target model, which can be activated through a trigger. In this work, we innovatively prevent the abuse of image content modification by implanting the backdoor into image-editing models. Once the protected sensitive content on an image is modified by an editing model, the backdoor will be triggered, making the editing fail. Unlike traditional backdoor attacks that use data poisoning, to enable protection on individual images and eliminate the need for model training, we developed the first framework for run-time backdoor implantation, which is both time- and resource- efficient. We generate imperceptible perturbations on the images to inject the backdoor and define the protected area as the only backdoor trigger. Editing other unprotected insensitive areas will not trigger the backdoor, which minimizes the negative impact on legal image modifications. Evaluations with state-of-the-art image editing models show that our protective method can increase the CLIP-FID of generated images from 12.72 to 39.91, or reduce the SSIM from 0.503 to 0.167 when subjected to malicious editing. At the same time, our method exhibits minimal impact on benign editing, which demonstrates the efficacy of our proposed framework. The proposed run-time backdoor can also achieve effective protection on the latest diffusion models. Code are available.

Attack as Defense: Run-time Backdoor Implantation for Image Content Protection

TL;DR

This work innovatively prevent the abuse of image content modification by implanting the backdoor into image-editing models through the first framework for run-time backdoor implantation, which is both time- and resource- efficient.

Abstract

As generative models achieve great success, tampering and modifying the sensitive image contents (i.e., human faces, artist signatures, commercial logos, etc.) have induced a significant threat with social impact. The backdoor attack is a method that implants vulnerabilities in a target model, which can be activated through a trigger. In this work, we innovatively prevent the abuse of image content modification by implanting the backdoor into image-editing models. Once the protected sensitive content on an image is modified by an editing model, the backdoor will be triggered, making the editing fail. Unlike traditional backdoor attacks that use data poisoning, to enable protection on individual images and eliminate the need for model training, we developed the first framework for run-time backdoor implantation, which is both time- and resource- efficient. We generate imperceptible perturbations on the images to inject the backdoor and define the protected area as the only backdoor trigger. Editing other unprotected insensitive areas will not trigger the backdoor, which minimizes the negative impact on legal image modifications. Evaluations with state-of-the-art image editing models show that our protective method can increase the CLIP-FID of generated images from 12.72 to 39.91, or reduce the SSIM from 0.503 to 0.167 when subjected to malicious editing. At the same time, our method exhibits minimal impact on benign editing, which demonstrates the efficacy of our proposed framework. The proposed run-time backdoor can also achieve effective protection on the latest diffusion models. Code are available.

Paper Structure

This paper contains 19 sections, 14 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Paradigm comparison of traditional backdoor framework (top row) and the proposed Run-time implant framework (bottom row). The traditional approach requires obtaining a compromised model via Trojan training prior to deployment, with the backdoor being activated during the inference stage. In contrast, our runtime implant framework bypasses the need for prior poisoning, enabling the backdoor to be activated solely during inference. Conventional backdoor typically relies on explicit trigger, and our method leverages region-aware trigger, which is imperceptible and can be activated with editing location.
  • Figure 2: Optimization target of our run-time backdoor. We use three different edit regions as input to guide the optimization of protective noise. In the first row, the entire trigger region is employed to optimize the implant loss $\mathcal{L}_{implant}$. The second row utilizes an expanded trigger region to address incomplete activation loss $\mathcal{L}_{incomplete}$. The hide loss $\mathcal{L}_{hide}$ in third row applies editing to regions without trigger to minimize interference with benign modifications, thereby preserving the image's editability on non-trigger inputs.
  • Figure 3: Examples illustrating the qualitative resistance of implanted imge to editing. The red-circled area in the figure highlights the inpainting result.
  • Figure 4: Example of qualitative ablation study on loss functions.
  • Figure 5: Example of qualitative ablation study on perturbation bound.
  • ...and 6 more figures